dd [Was: Re: [R] Selection of cities sample]

Sun Apr 25 04:43:11 CEST 2004

On Friday 23 of April 2004 04:37, Matej Cepl wrote:
> I have a question, how to most properly select set of cities 
> which would be as similar as possible in some particular 
> variables with the City of Boston (which I use as my base
> line).  

Hi,

how to weigh variables used in daisy function? After week spent 
with MASS, Crawley (2002), and Gordon (1999), I finished with 
this function (which is actually not a real function but just 
convenient packaging of one complex expression):

function(x) {
   require(cluster)
   return(hclust(daisy(
      as.matrix(x),
      metric="euclidean",
      stand=TRUE),
      method="average")
   )
}

When plotting this I got a huge tree (available in PDF on http://
www.ceplovi.cz/matej/tmp/mctree.pdf), which seems to be very 
helpful, because by selecting particular cluster I get my group 
of cities to use as a sample. Would anybody be so kind and 
comment on this code, please?

Now, I would love to weigh some variables in a dataframe used for 
calculation (because I am more concerned with some variables 
more than with others, which should be included with lower 
weigh). In help("daisy") I found this:

	If 'nok' is the number of nonzero weights, the
	dissimilarity is multiplied by the factor '1/nok' and thus
	ranges between 0 and 1.

Do I understand correctly that this allows weighing of 
non-interval (non-continuous) variables? If yes, how can weigh 
variables, which are interval (whole my table is from counts and 
two percent variables)?

	Thanks for any reply,

		Matej Cepl

-- 
Matej Cepl, http://www.ceplovi.cz/matej
GPG Finger: 89EF 4BC6 288A BF43 1BAB  25C3 E09F EF25 D964 84AC
138 Highland Ave. #10, Somerville, Ma 02143, (617) 623-1488

Just remember, brothers and sisters--their skins may be white,
but their souls are just as black as ours!
   -- a black preacher