# [R] Selection of cities sample

Matej Cepl cepl at surfbest.net
Fri Apr 23 10:37:26 CEST 2004

```On Thursday 22 of April 2004 00:56, Matej Cepl wrote:
> I have a question, how to most properly select set of cities
> which would be as similar as possible in some particular
> variables with the City of Boston (which I use as my base
> line).
> I thought about ordering cities by sum of ((differences between
> value of that particular variable for that particular city and
> the value of same variable for Boston) divided by the standard
> deviation of the variable and multiplied by the weight
> (expressing how much that particular variable is important for
> me, or how much I want to avoid cities with this
> characteristic)). Is it sound method?
>
> Or I am creating something which is already available in R as a
> standard function (which I suspect)?

So, OK, I did it again -- asking without knowing what I am asking
for. I'm sorry for that. After whole day spending with Gordon's
"Classification" and MASS on cluster analysis, I made this
function:

# \$Id: compute.R,v 1.1 2004/04/23 08:19:06 matej Exp matej \$
# Make selection of rows from data frame which are most similar
# to the row identified by (at least part of) row.name.
# variables:
#   dframe -- data frame to select from
#   variables -- vector of names of variables to be used for
#      computation of similarity
#   weights -- vector of weights measuring how much each variable
#      is important for the selection
#   basename -- row.name of the row which the similarity should
#      be measured with
#   howmany -- how many elements should be selected

makeSelection <- function (dframe, variables = names(dframe),
weights = 1, basename, howmany = length(row.names(dframe)))
{
normal <- dframe[variables]/sd(dframe,na.rm=TRUE)
normal\$coef <- apply(normal,1, function (x)
{ sum(x*weights) })
base.coef <- normal[grep(basename,row.names(normal)),]\$coef
normal\$distance <- abs(normal\$coef - base.coef)
return(row.names(normal[order(normal\$distance),])[2:howmany
+1])
}

Can anybody comment on this please, whether it does roughly what
I described above?

Thanks a lot,

Matej

--
Matej Cepl, http://www.ceplovi.cz/matej
GPG Finger: 89EF 4BC6 288A BF43 1BAB  25C3 E09F EF25 D964 84AC
138 Highland Ave. #10, Somerville, Ma 02143, (617) 623-1488

The law, in its majestic equality, forbids the rich as well as
the poor to sleep under bridges, to beg in the streets, and to