[R] Different PCA results under Windows and Linux

Mark Difford mark_difford at yahoo.co.uk
Wed Sep 17 20:43:29 CEST 2008


Hi Jathine,

>> I hope this can explain the problem a bit more clearly. 
>> Why PCA gives different results on the two different platforms?

What is amazing, Jathine, is how nearly exactly identical the two sets of
results are, not that they begin to differ at the 16th decimal place. To
assuage your concerns, do the following on the results from your two trials:

round(p1$var$coord, 15)
?round

## And read the famous FAQ on floating point arithmetic

It also isn't a very good idea to be doing PCAs on 0s and 1s

Regards, Mark.


jathine wrote:
> 
> Thank you for your reply.
> Here are some more info, I hope this can explain the problem a bit more
> clearly. 
> Why PCA gives different results on the two different platforms?
> 
> freqtest.txt file line text : 
> M1 M2 M3 M4 M5 M6 M7 M8
> -1 -1 -1 -1 -1 -1 -1 -1
> 0 0 0 0 -1 -1 1 1
> -1 -1 -1 -1 -1 -1 -1 -1
> 0 0 0 0 -1 -1 1 1
> 
> ******Linux R script result and sessionInfo()
>> library(FactoMineR)
>> x1=read.table("freqtest.txt", header=TRUE)
>> xrcc2=x1[,1:8]
>> p1=PCA(xrcc2, graph=FALSE)
>> p1$var
> 
> $coord
>    Dim.1         Dim.2         Dim.3
> M1     1 -3.925231e-16 -2.287663e-48
> M2     1  7.850462e-17 -3.600641e-32
> M3     1  7.850462e-17  9.001602e-33
> M4     1  7.850462e-17  9.001602e-33
> M5     0  0.000000e+00  0.000000e+00
> M6     0  0.000000e+00  0.000000e+00
> M7     1  7.850462e-17  9.001602e-33
> M8     1  7.850462e-17  9.001602e-33
> 
> $cor
>    Dim.1         Dim.2         Dim.3
> M1     1 -3.925231e-16 -2.287663e-48
> M2     1  7.850462e-17 -3.600641e-32
> M3     1  7.850462e-17  9.001602e-33
> M4     1  7.850462e-17  9.001602e-33
> M5   NaN           NaN           NaN
> M6   NaN           NaN           NaN
> M7     1  7.850462e-17  9.001602e-33
> M8     1  7.850462e-17  9.001602e-33
> 
> $cos2
>    Dim.1        Dim.2        Dim.3
> M1     1 1.540744e-31 5.233404e-96
> M2     1 6.162976e-33 1.296462e-63
> M3     1 6.162976e-33 8.102884e-65
> M4     1 6.162976e-33 8.102884e-65
> M5   NaN          NaN          NaN
> M6   NaN          NaN          NaN
> M7     1 6.162976e-33 8.102884e-65
> M8     1 6.162976e-33 8.102884e-65
> 
> $contrib
>       Dim.1     Dim.2        Dim.3
> M1 16.66667 83.333333 3.229346e-31
> M2 16.66667  3.333333 8.000000e+01
> M3 16.66667  3.333333 5.000000e+00
> M4 16.66667  3.333333 5.000000e+00
> M5  0.00000  0.000000 0.000000e+00
> M6  0.00000  0.000000 0.000000e+00
> M7 16.66667  3.333333 5.000000e+00
> M8 16.66667  3.333333 5.000000e+00
> 
>> sessionInfo()
> R version 2.7.1 (2008-06-23)
> x86_64-redhat-linux-gnu
> 
> locale:
> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
> [1] FactoMineR_1.09
>>
> 
> ******Windows R script result and sessionInfo()
>> library(FactoMineR)
>> x1=read.table("freqtest.txt", header=TRUE)
>> xrcc2=x1[,1:8]
>> p1=PCA(xrcc2, graph=FALSE)
>> p1$var
> $coord
>    Dim.1         Dim.2         Dim.3
> M1     1  2.458061e-16 -4.590163e-49
> M2     1 -4.916122e-17 -4.750455e-32
> M3     1 -4.916122e-17  1.187614e-32
> M4     1 -4.916122e-17  1.187614e-32
> M5     0  0.000000e+00  0.000000e+00
> M6     0  0.000000e+00  0.000000e+00
> M7     1 -4.916122e-17  1.187614e-32
> M8     1 -4.916122e-17  1.187614e-32
> 
> $cor
>    Dim.1         Dim.2         Dim.3
> M1     1  2.458061e-16 -4.590163e-49
> M2     1 -4.916122e-17 -4.750455e-32
> M3     1 -4.916122e-17  1.187614e-32
> M4     1 -4.916122e-17  1.187614e-32
> M5   NaN           NaN           NaN
> M6   NaN           NaN           NaN
> M7     1 -4.916122e-17  1.187614e-32
> M8     1 -4.916122e-17  1.187614e-32
> 
> $cos2
>    Dim.1        Dim.2        Dim.3
> M1     1 6.042064e-32 2.106959e-97
> M2     1 2.416826e-33 2.256682e-63
> M3     1 2.416826e-33 1.410426e-64
> M4     1 2.416826e-33 1.410426e-64
> M5   NaN          NaN          NaN
> M6   NaN          NaN          NaN
> M7     1 2.416826e-33 1.410426e-64
> M8     1 2.416826e-33 1.410426e-64
> $contrib
>       Dim.1     Dim.2        Dim.3
> M1 16.66667 83.333333 7.469228e-33
> M2 16.66667  3.333333 8.000000e+01
> M3 16.66667  3.333333 5.000000e+00
> M4 16.66667  3.333333 5.000000e+00
> M5  0.00000  0.000000 0.000000e+00
> M6  0.00000  0.000000 0.000000e+00
> M7 16.66667  3.333333 5.000000e+00
> M8 16.66667  3.333333 5.000000e+00
> 
>> sessionInfo()
> R version 2.7.2 (2008-08-25)
> i386-pc-mingw32
> 
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
> [1] FactoMineR_1.09
>>
> 
> 
> 
> Steven McKinney wrote:
>> 
>> 
>> Not likely that anyone can explain, as
>> there is not enough information in your
>> email.
>> 
>> Including the contents of the freqtest.txt file
>> was a good idea, as the posting guide suggests
>> (the posting guide is that clearly labeled bit
>> at the bottom that looks like this:
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> Check it out! It is cool.)
>> 
>> Additionally, include the command 
>>   sessionInfo() 
>> and its output from all machines you refer to
>> so maintainers know which versions of software
>> you are running.  Also, include the output you obtained
>> from your code (with your code being a self-contained 
>> and reproducible set of  R commands).
>> 
>> Finally, describe what the difference is and why
>> the difference is problematic (i.e. don't report
>> machine precision differences, or sign differences
>> for PCA results - PCA vector directions are arbitrary
>> modulo 180 degrees).
>> 
>>> I also tried mean(xrcc2) and sd(xrcc2) on both machines, the results are
>>> the
>>> same. 
>>> Please explain.
>> 
>> The R maintainers do an amazing job of creating
>> numerically stable platform-independent software,
>> so you get the same results almost everywhere.
>> (Thank you R core!)
>> 
>> 
>> HTH
>> 
>> Steve McKinney
>> 
>> -----Original Message-----
>> From: r-help-bounces at r-project.org on behalf of jathine
>> Sent: Tue 9/16/2008 2:19 PM
>> To: r-help at r-project.org
>> Subject: [R]  Different PCA results under Windows and Linux
>>  
>> 
>> I ran the following R script under both Linux and Windows, and got 2
>> different results.
>> Linux R version 2.7.1 and Windows R version 2.7.2.
>> 
>>> library(FactoMineR)
>>>x1=read.table("freqtest.txt",header=TRUE)
>>>xrcc2=x1[,1:8]
>>>p1=PCA(xrcc2, graph=FALSE)
>>>p1$var
>> 
>> freqtest.txt file lines of text :
>> M1 M2 M3 M4 M5 M6 M7 M8
>> -1 -1 -1 -1 -1 -1 -1 -1
>> 0 0 0 0 -1 -1 1 1
>> -1 -1 -1 -1 -1 -1 -1 -1
>> 0 0 0 0 -1 -1 1 1 
>> 
>> I also tried mean(xrcc2) and sd(xrcc2) on both machines, the results are
>> the
>> same. 
>> Please explain.
>> 
>> 
>> -- 
>> View this message in context:
>> http://www.nabble.com/Different-PCA-results-under-Windows-and-Linux-tp19520449p19520449.html
>> Sent from the R help mailing list archive at Nabble.com.
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Different-PCA-results-under-Windows-and-Linux-tp19520449p19538612.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list