[BioC] positively correlated genes

Gordon K Smyth smyth at wehi.EDU.AU
Sat Sep 13 02:40:31 CEST 2014

What you are doing is basically correct, except that you're making it 
slightly more complicated than it needs to be, and you're rounding 
counts-per-million to integers which is inappropriate and unnecessary.

Slightly simpler and more correct is:

   logCPM <- cpm(RLE, log=TRUE, prior.count=2)
   MSTN <- logCPM["MSTN",]
   design <- model.matrix(~ID.SM+MSTN)

You ask why MSTN correlates with itself with logFC=1.  You said you were 
familiar with Pearson correlation.  You must be familiar with the fact 
that correlating a variable with itself will give a correlation of 1 and a 
p-value of zero, and the situation is similar here.

The edgeR model is fitting something analogous to:

   log(expression of Y) = logFC * log(expression of MSTN)

Obviously if Y is MSTN itself then you should get logFC=1.


  On Fri, 12 Sep 2014, Sindre Lee 

> Thank you!
> Which expression values should I input?  I did this:
> # Used FeatureCount
> x2 <- DGEList(counts=fc$counts[ , -c(1:48) ], genes=fc$annotation)
> # RLE normalization 
> RLE <- calcNormFactors(x2, method="RLE")
> RLEScaleFactors <- x2$samples$lib.size * RLE$samples$norm.factors 
> RLEExp <- round(t(t(RLE$counts)/RLEScaleFactors) * 
> mean(RLEScaleFactors))
> # Add MSNT expression as continous variable
> MSTN <- RLEExp[rownames(RLEExp)=="MSTN", ]
> # Paired samples
> ID.SM <- ID[-c(1:48)]
> # Making the design
> design <- model.matrix(~ID.SM+log2(MSTN+0.125))
> # Estimating dispersion
> RLE <- estimateGLMCommonDisp(RLE, design, verbose=TRUE)
> RLE <- estimateGLMTagwiseDisp(RLE, design)
> # Fitting the linear model
> fit <- glmFit(RLE,design, prior.count=0.125, 
> dispersion=RLE$tagwise.dispersion)
> The result was:
>	logFC	logCPM	LR	PValue
> MSTN	1E+00	3E+00	5E+03	0E+00
> SLC44A2	1E-01	7E+00	2E+02	6E-48
> OSBPL1A	1E-01	7E+00	2E+02	1E-35
> KIAA0195	2E-01	6E+00	1E+02	2E-33
> BOLL	3E-01	2E+00	1E+02	1E-32
> GPD2	4E-01	6E+00	1E+02	2E-31
> Now, why does MSTN correlate with itself  with log2FC = 1? 
> One time, which I can't reproduce, I got log2FC=0.

From: Gordon K Smyth <smyth at wehi.EDU.AU>
Sent: 13 September 2014 01:10
To: Sindre Lee
Cc: deepaksrna at gmail.com; Bioconductor mailing list
Subject: RE: [BioC] positively correlated genes

If the logFC is positive, the gene is positively correlated with gene X.

If the logFC is negative, the gene is negatively correlated with gene X.

You should be using the FDR column rather than the PValue column to judge
significance, as is usual with edgeR analyses.

Pearson correlation between y and x is equivalent to regressing y on x.
Pearson correlation is significant and positive if and only if the
regression coefficient of y on x is significantly positive.  Situation is
similar here.


On Fri, 12 Sep 2014, Sindre Lee wrote:

> Can I please ask how to interpret this results? Im used to
> Spearman/Pearson correlations and don't quite know how to present or
> explain the results obtained this way.

I wanted to find genes correlating with gene X. Then I got about 6000
significant genes at p < 0.05. Some with negative some with positive

Now, what do I do? What does this tell me?

Thank you!

From: bioconductor-bounces at r-project.org <bioconductor-bounces at r-project.org> on behalf of Gordon K Smyth <smyth at wehi.edu.au>
Sent: 10 September 2014 03:08
To: deepaksrna at gmail.com
Cc: Bioconductor mailing list
Subject: [BioC] positively correlated genes

If you are using edgeR's glmFit function or limma's voom and lmFit
functions, you can simply add the log-expression values of the gene of
interest as a column of the design matrix.  Then a standard DE analysis
will detect any other genes that are significantly correlated with the
gene of interest.


> Date: Tue,  9 Sep 2014 01:41:14 -0700 (PDT)
> From: "karthik [guest]" <guest at bioconductor.org>
> To: bioconductor at r-project.org, deepaksrna at gmail.com
> Subject: [BioC] positively correlated genes
> hi
>   I am interested to find out the genes that are positively and
> negatively correlated genes with my genes of interest. (using rnaseq
> normalized expression data). Can some one suggest me a better option.
> Thank you
> -- output of sessionInfo():
> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-w64-mingw32/x64 (64-bit)

The information in this email is confidential and intend...{{dropped:4}}

More information about the Bioconductor mailing list