[R] Correlation coefficient of large data sets

Joshua Wiley jwiley.psych at gmail.com
Tue Mar 16 06:32:22 CET 2010


Hello Vincent,

The command to correlate two variables and a set is the same (see
?cor).  How have you read the data in?  If it is a matrix or data
frame, you should be able to just use cor(name_of_your_matrix) and it
will return the correlation matrix for all variables in your matrix or
data frame.

If you read each of your 230,000 variables in separately, you can
combine them into a matrix or dataframe using cbind(variablename1, 2,
etc.).

HTH,


Josh



On Mon, Mar 15, 2010 at 10:12 PM, Vincent Davis
<vincent at vincentdavis.net> wrote:
> So I am very new to R. Have been using python for a project and need to
> calculate the correlation coefficient matrix for my data set. the data is in
> the range of 10-15 observations of 230,000 variables. ie the correlation
> matrix would be 230,000X230,000  Using python and the numpy.corrcoef() I run
> out of memory if I try to do this with more than ~30,000 variables.
>
> I was able to load the data into R, remember I am newbe so this is big :)
>
> I could find commands that would calculate the correlation between 2
> variables but not for a set of variables. How do I do this?
>
> Am I going to be able to do this with R, I have the 64 bit version installed
> and have access to an 8 core machine with 48GB of memory.
>
>
>  *Vincent Davis
> 720-301-3003 *
> vincent at vincentdavis.net
>  my blog <http://vincentdavis.net> |
> LinkedIn<http://www.linkedin.com/in/vincentdavis>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Senior in Psychology
University of California, Riverside
http://www.joshuawiley.com/



More information about the R-help mailing list