[BioC] positively correlated genes

Sindre Lee sindre.lee at medisin.uio.no
Sat Sep 13 01:53:27 CEST 2014

Thank you!

Which expression values should I input?  I did this:

# Used FeatureCount
x2 <- DGEList(counts=fc$counts[ , -c(1:48) ], genes=fc$annotation)

# RLE normalization 
RLE <- calcNormFactors(x2, method="RLE")
RLEScaleFactors <- x2$samples$lib.size * RLE$samples$norm.factors 
RLEExp <- round(t(t(RLE$counts)/RLEScaleFactors) * mean(RLEScaleFactors)) 

# Add MSNT expression as continous variable
MSTN <- RLEExp[rownames(RLEExp)=="MSTN", ]

# Paired samples
ID.SM <- ID[-c(1:48)]

# Making the design
design <- model.matrix(~ID.SM+log2(MSTN+0.125))

# Estimating dispersion
RLE <- estimateGLMCommonDisp(RLE, design, verbose=TRUE)
RLE <- estimateGLMTagwiseDisp(RLE, design)

# Fitting the linear model
fit <- glmFit(RLE,design, prior.count=0.125, dispersion=RLE$tagwise.dispersion)

The result was:

	logFC	logCPM	LR	PValue
MSTN	1E+00	3E+00	5E+03	0E+00
SLC44A2	1E-01	7E+00	2E+02	6E-48
OSBPL1A	1E-01	7E+00	2E+02	1E-35
KIAA0195	2E-01	6E+00	1E+02	2E-33
BOLL	3E-01	2E+00	1E+02	1E-32
GPD2	4E-01	6E+00	1E+02	2E-31

Now, why does MSTN correlate with itself  with log2FC = 1? 
One time, which I can't reproduce, I got log2FC=0.

From: Gordon K Smyth <smyth at wehi.EDU.AU>
Sent: 13 September 2014 01:10
To: Sindre Lee
Cc: deepaksrna at gmail.com; Bioconductor mailing list
Subject: RE: [BioC] positively correlated genes

If the logFC is positive, the gene is positively correlated with gene X.

If the logFC is negative, the gene is negatively correlated with gene X.

You should be using the FDR column rather than the PValue column to judge
significance, as is usual with edgeR analyses.

Pearson correlation between y and x is equivalent to regressing y on x.
Pearson correlation is significant and positive if and only if the
regression coefficient of y on x is significantly positive.  Situation is
similar here.


On Fri, 12 Sep 2014, Sindre Lee wrote:

> Can I please ask how to interpret this results? Im used to
> Spearman/Pearson correlations and don't quite know how to present or
> explain the results obtained this way.

I wanted to find genes correlating with gene X. Then I got about 6000
significant genes at p < 0.05. Some with negative some with positive

Now, what do I do? What does this tell me?

Thank you!

From: bioconductor-bounces at r-project.org <bioconductor-bounces at r-project.org> on behalf of Gordon K Smyth <smyth at wehi.edu.au>
Sent: 10 September 2014 03:08
To: deepaksrna at gmail.com
Cc: Bioconductor mailing list
Subject: [BioC] positively correlated genes

If you are using edgeR's glmFit function or limma's voom and lmFit
functions, you can simply add the log-expression values of the gene of
interest as a column of the design matrix.  Then a standard DE analysis
will detect any other genes that are significantly correlated with the
gene of interest.


> Date: Tue,  9 Sep 2014 01:41:14 -0700 (PDT)
> From: "karthik [guest]" <guest at bioconductor.org>
> To: bioconductor at r-project.org, deepaksrna at gmail.com
> Subject: [BioC] positively correlated genes
> hi
>   I am interested to find out the genes that are positively and
> negatively correlated genes with my genes of interest. (using rnaseq
> normalized expression data). Can some one suggest me a better option.
> Thank you
> -- output of sessionInfo():
> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-w64-mingw32/x64 (64-bit)

The information in this email is confidential and intend...{{dropped:6}}

More information about the Bioconductor mailing list