[R] rcorr p-values for pearson's correlation coefficients

Amit amitkumartiwary at gmail.com
Wed May 5 12:20:41 CEST 2010


Hi! All,

To find co-expressed genes from a expression matrix of dimension (9275
X 569), I used rcorr function from library(Hmisc) to calculate pearson
correlation coefficient (PCC) and their corresponding p-values. From
the correlation matrix (9275 X 9275) and pvalue matrix (9275 X 9275)
obtained using rcorr function, I wanted to select those pairs whose
PCC's are above 0.8 cut-off and then for those gene pairs their
corresponding p-values. I did following:

library(Hmisc)
#loading expression matrix
load("ratio_exp123.RData")
#calculation correlation and p-values of genes using rcorr function
y_cor_p123=rcorr(t(ratio_exp123),type="pearson")
save(y_cor_p123,file="y_cor_p123.RData")
diag(y_cor_p123$r)=0

#Selecting gene pairs with pcc value greater than 0.8
gene1=matrix(0,1,1)
gene2=matrix(0,1,1)
pcc=matrix(0,1,1)
for(i in 1:nrow(y_cor_p123$r))
{
for(j in 1:nrow(y_cor_p123$r))
{
if(y_cor_p123$r[i,j]>0.8)
{
gene1=rbind(gene1,rownames(y_cor_p123$r[i,j,drop=F]))
gene2=rbind(gene2,colnames(y_cor_p123$r[i,j,drop=F]))
pcc=rbind(pcc,y_cor_p123$r[i,j])
}
}
}
y_gp123=cbind(gene1,gene2,pcc)
colnames(y_gp123)=c("Gene1","Gene2","PCC")
y_gp123=y_gp123[-1,]

#Selecting p-values for gene pairs with pcc above 0.8
z=matrix(0,nrow(y_gp123),1)
for(i in 1:nrow(y_gp123))
{rowno=grep(y_gp123[i,1],rownames(y_cor_p123$P))
colno=grep(y_gp123[i,2],colnames(y_cor_p123$P))
z[i,]=y_cor_p123$P[rowno,colno]}

y_gp123=cbind(y_gp123,z)
> head(y_gp123)
     Gene1        Gene2        PCC                 P-Value
[1,] "10003_f_at" "10166_at"   "0.816270470619202" "0"
[2,] "10003_f_at" "10270_at"   "0.85069590806961"  "0"
[3,] "10003_f_at" "10302_g_at" "0.840051412582397" "0"
[4,] "10003_f_at" "10386_s_at" "0.833585679531097" "0"
[5,] "10003_f_at" "10473_i_at" "0.863786697387695" "0"
[6,] "10003_f_at" "10474_s_at" "0.84228765964508"  "0"
> tail(y_gp123)
         Gene1     Gene2     PCC                 P-Value
[63739,] "9996_at" "8265_at" "0.827129244804382" "0"
[63740,] "9996_at" "8375_at" "0.812156021595001" "0"
[63741,] "9996_at" "8615_at" "0.802257716655731" "0"
[63742,] "9996_at" "8788_at" "0.810672044754028" "0"
[63743,] "9996_at" "9567_at" "0.846206307411194" "0"
[63744,] "9996_at" "9969_at" "0.838157832622528" "0"

Here, you can see all the P-values are zero. I think its not possible
to have all p-values for the PCCs above 0.8 equal to zero. Hence, can
somebody point out what going wrong in here. Any help would be
appreciated.

regards
Amit



More information about the R-help mailing list