[R] Odp: In need of help with correlations

Petr PIKAL petr.pikal at precheza.cz
Mon Apr 11 09:04:40 CEST 2011


Hi


r-help-bounces at r-project.org napsal dne 09.04.2011 19:24:38:

> I am in need of someone's help in correlating gene expression. I'm 
somewhat
> new to R, and can't seem to find anyone local to help me with what I 
think
> is a simple problem.
> 
> I need to obtain pearson and spearman correlation coefficients, and
> corresponding p-values for all of the genes in my dataset that correlate 
to
> one specific gene of interest. I'm working with mouse Affymetrix Mouse 
430
> 2.0 arrays, so I've got about 45,000 probesets (rows; with 1st column
> containing identifiers) and 30 biological replicates (columns; with the 
top
> row containing the header information).
> 
> I've looked through several Intro manuals and the R help files.
> 
> I know that "cor(x,y, use ="everything", method = c("pearson")) " can 
help
> obtain the coefficients.
> 
> I also know that "cor.test()" is supposed to test the significance of a
> single correlation coefficients.
> 
> I've also found the bioconductor package "genefilter" / "genefinder" 
that
> looks for correlations to a given gene (although I can't get it to 
work).
> 
> So far I've been able to:
> 
> #Read in the csv file
> data<-read.csv("my data.csv")
> 
> #Check the dimensions, names, class, fix(data)  to ensure the file was
> loaded properly
> dim(data)
> names(data)
> class(data)
> fix(data)
> 
> #So far I've been able to successfully correlate the entire 'column' 
matrix
> through:
> x <- data[,2:30]
> y <- data[,2:30]
> 
> corr.data<-cor(x,y, use = "everything", method = c("pearson"))
> 
> write.csv(corr.data, file = "correlation of my data by columns.csv")
> 
> -----------------------------------
> 
> Now if I try and run the 'cor.test()' function on the same matrix, I get 
and
> error message with 'x' must be a numeric vector. This I don't 
understand.

In cor.test help page it is said

x, y: numeric vectors of data values.  ‘x’ and ‘y’ must have the
          same length.

however your data[,2:30] is most probably data frame, see

str(data[,2:20])

To be able to do cor.test  you need to do cor.test like

cor.test(data[,2], data[,3])

or to do it in some cycle (untested)
result <- matrix(NA, 20,20)

for( i in 2:20) {
for(j in i+1:20) {

result[i,j] <- cor.test(data[,i], data[,j])
}}

But most probably there are other ways.

Regards
Petr


> And this is not my goal, but rather me trying to learn how to go about 
doing
> correlation analysis in R.
> 
> I've also tried transposing the data.frame using 
"as.data.frame(t(data))"
> and doing so gives the same error message as above.
> 
> Can anyone help me with figuring out how to conduct a correlation 
analysis
> for specific gene/probeset, and help me understand why I get the above 
error
> message? I know it probably is a simple analysis, that is probably just 
over
> my head right now since I'm still new to R. But I can't figure it out 
and
> have been trying with a bunch of different variations for the past week.
> 
> Thank you in advance for your help.
> 
>    [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list