[R] Selection of cities sample

Matej Cepl cepl at surfbest.net
Fri Apr 23 10:37:26 CEST 2004


On Thursday 22 of April 2004 00:56, Matej Cepl wrote:
> I have a question, how to most properly select set of cities 
> which would be as similar as possible in some particular 
> variables with the City of Boston (which I use as my base
> line).  
> I thought about ordering cities by sum of ((differences between 
> value of that particular variable for that particular city and 
> the value of same variable for Boston) divided by the standard 
> deviation of the variable and multiplied by the weight 
> (expressing how much that particular variable is important for 
> me, or how much I want to avoid cities with this 
> characteristic)). Is it sound method?
> 
> Or I am creating something which is already available in R as a 
> standard function (which I suspect)?

So, OK, I did it again -- asking without knowing what I am asking 
for. I'm sorry for that. After whole day spending with Gordon's 
"Classification" and MASS on cluster analysis, I made this 
function:

# $Id: compute.R,v 1.1 2004/04/23 08:19:06 matej Exp matej $
# Make selection of rows from data frame which are most similar
# to the row identified by (at least part of) row.name.
# variables:
#   dframe -- data frame to select from
#   variables -- vector of names of variables to be used for
#      computation of similarity
#   weights -- vector of weights measuring how much each variable
#      is important for the selection
#   basename -- row.name of the row which the similarity should
#      be measured with
#   howmany -- how many elements should be selected

makeSelection <- function (dframe, variables = names(dframe),
      weights = 1, basename, howmany = length(row.names(dframe))) 
{
   normal <- dframe[variables]/sd(dframe,na.rm=TRUE)
   normal$coef <- apply(normal,1, function (x) 
{ sum(x*weights) })
   base.coef <- normal[grep(basename,row.names(normal)),]$coef
   normal$distance <- abs(normal$coef - base.coef)
   return(row.names(normal[order(normal$distance),])[2:howmany
+1])
}

Can anybody comment on this please, whether it does roughly what 
I described above?

	Thanks a lot,

		Matej

-- 
Matej Cepl, http://www.ceplovi.cz/matej
GPG Finger: 89EF 4BC6 288A BF43 1BAB  25C3 E09F EF25 D964 84AC
138 Highland Ave. #10, Somerville, Ma 02143, (617) 623-1488
 
The law, in its majestic equality, forbids the rich as well as
the poor to sleep under bridges, to beg in the streets, and to
steal bread.
    -- Anatole France




More information about the R-help mailing list