[R] Selection of cities sample
Matej Cepl
cepl at surfbest.net
Fri Apr 23 10:37:26 CEST 2004
On Thursday 22 of April 2004 00:56, Matej Cepl wrote:
> I have a question, how to most properly select set of cities
> which would be as similar as possible in some particular
> variables with the City of Boston (which I use as my base
> line).
> I thought about ordering cities by sum of ((differences between
> value of that particular variable for that particular city and
> the value of same variable for Boston) divided by the standard
> deviation of the variable and multiplied by the weight
> (expressing how much that particular variable is important for
> me, or how much I want to avoid cities with this
> characteristic)). Is it sound method?
>
> Or I am creating something which is already available in R as a
> standard function (which I suspect)?
So, OK, I did it again -- asking without knowing what I am asking
for. I'm sorry for that. After whole day spending with Gordon's
"Classification" and MASS on cluster analysis, I made this
function:
# $Id: compute.R,v 1.1 2004/04/23 08:19:06 matej Exp matej $
# Make selection of rows from data frame which are most similar
# to the row identified by (at least part of) row.name.
# variables:
# dframe -- data frame to select from
# variables -- vector of names of variables to be used for
# computation of similarity
# weights -- vector of weights measuring how much each variable
# is important for the selection
# basename -- row.name of the row which the similarity should
# be measured with
# howmany -- how many elements should be selected
makeSelection <- function (dframe, variables = names(dframe),
weights = 1, basename, howmany = length(row.names(dframe)))
{
normal <- dframe[variables]/sd(dframe,na.rm=TRUE)
normal$coef <- apply(normal,1, function (x)
{ sum(x*weights) })
base.coef <- normal[grep(basename,row.names(normal)),]$coef
normal$distance <- abs(normal$coef - base.coef)
return(row.names(normal[order(normal$distance),])[2:howmany
+1])
}
Can anybody comment on this please, whether it does roughly what
I described above?
Thanks a lot,
Matej
--
Matej Cepl, http://www.ceplovi.cz/matej
GPG Finger: 89EF 4BC6 288A BF43 1BAB 25C3 E09F EF25 D964 84AC
138 Highland Ave. #10, Somerville, Ma 02143, (617) 623-1488
The law, in its majestic equality, forbids the rich as well as
the poor to sleep under bridges, to beg in the streets, and to
steal bread.
-- Anatole France
More information about the R-help
mailing list