[R] calculate Euclidean distances between populations in R with this data structure

Sarah Goslee sarah.goslee at gmail.com
Fri Sep 5 02:16:47 CEST 2014


Hi,

Please keep your replies on the R-help list so others may participate
in the conversation.

On Thu, Sep 4, 2014 at 8:12 PM, Ding, Yuan Chun <ycding at coh.org> wrote:
> Hi Sarah,
>
> Thank you very  much for your quick response.
>
> I checked the dist() function. It calculate distance between two samples with a number of variables.
>
>    Variable1 variable 2 variable 3 variable4 ....
> X      3       5           6            7
> Y      4       8           9            10
>
> So it is easy to calculate distance between x and y.
>
> But in my study, X is a group with 20 samples and y is another group with 30 samples, so I need to calculate distance between x group between y group.


That doesn't make any sense to me. If the variables are different, how
can you calculate a distance between them? You also potentially run
into scaling issues. Also, your original question (below) stated that
your populations have 20 samples.

> I think I need to get mean for each group, then use dist() function.  I tried to find a R package to do it.

I think you'd be better off reconsidering what you're trying to accomplish.

Sarah

> Thanks,
>
> Ding
>
> -----Original Message-----
> From: Sarah Goslee [mailto:sarah.goslee at gmail.com]
> Sent: Thursday, September 04, 2014 4:49 PM
> To: Ding, Yuan Chun
> Cc: r-help at R-project.org
> Subject: Re: [R] calculate Euclidean distances between populations in R with this data structure
>
> I'd probably start with ?dist
>
> Sarah
>
> On Thu, Sep 4, 2014 at 4:10 PM, Ding, Yuan Chun <ycding at coh.org> wrote:
>>
>>
>>
>> I want to calculate Euclidean distance between 12 populations, in each population there are 20 samples and each sample is measured for 100 genes (these are microarray data; the numbers here are just examples).
>> The equation I found is:
>> distance = sqrt{[sum(Average of xi -average of yi)^2] /n }, i=1 to n;
>> where xi and yi are the expression of gene i over two populations with p and q samples (x1, x2,...,xp), (y1, y2,...,yq), n is the number of genes.
>> part of data are pasted below
>> row.names pop1.1    pop1.2  pop1.3  pop1.4  pop2.1  pop2.2  pop2.3  pop2.4
>> 7A5     5.38194 4.06191 4.88044 5.60383 6.23101 6.53738 4.80336 5.86136
>> A1BG    5.15155 4.29441 4.59131 4.90026 4.62908 4.48712 4.73039 4.46208
>> A1CF    4.22396 4.14451 4.41465 3.93179 4.89638 4.66109 4.20918 4.48107
>> A26C3   12.1969 12.4179 10.9786 11.7659 11.405  11.7594 11.1757 11.8128
>> How might one calculate these distances in R with this data structure?
>>
>>
>> Thanks,
>>
>> Ding
>>
>


-- 
Sarah Goslee
http://www.functionaldiversity.org



More information about the R-help mailing list