[BioC] Heatmaps for EdgeR

Fri Mar 21 19:19:53 CET 2014

Hi Eleanor,

Please CC (use "reply-all") the bioconductor mailing list on all 
correspondences so that everyone can help (and benefit) from this 
discussion.

Comments in line:

On 21 Mar 2014, at 11:02, Eleanor Su wrote:

> Can you explain what you mean with that a bit more. You shouldn't be
> doing any normalization of your actual counts prior to feeding them to
> edgeR, are you?
>
> I'm only working with small non-coding RNAs of a non-model organism. 
> Since
> this is a fairly new kind of analysis, I'm following someone else's
> pipeline. Thus I've normalized my samples prior doing analysis in R. 
> I've
> normalize all my counts based on the reads generated.

What I mean is that you shouldn't do that :-)

Have you read through the edgeR User's Guide? The `calcNormFactors` does 
the step that it sounds like you are doing before analysis -- but it 
also keeps the count data "in tact" which is what you want. I guess you 
are dividing your counts by some normalization constant prior to edgeR 
analysis, which is a big no-no.

The (expression) input to edgeR should be the raw count matrix of 
features x samples -- many people choose to use only uniquely mapping 
reads for this purpose, so probably a good idea for you to ensure that 
is the case (at least for your first analysis).

>> Look at section 2.10 of the edgeR User's Guide (Clustering, heatmaps,
>> etc.) where the authors identify this to still be a matter of
>> research, but they suggest to use "moderated log-counts-per-million"
>
> I've generated a heatmap already using this script, but I only want a
> heatmap of the significant differentially expressed sequences.

What script?

> When I
> generate the heatmap accordingly to the section 2.10, I end up with a
> heatmap that I can't even read because it's plotting all the 
> sequences.
> Would you suggest just generating a new file with only significant
> sequences and then generating a heatmap accordingly to section 2.10?

When you call the `heatmap` function (or whatever function you are using 
to generate these things (the aheatmap function from the NMF package is 
quite nice, btw)), you should only pass it a matrix that consists of the 
rows you want to plot.

You do not have to generate an intermediary new file to do this.

Don't take this the wrong way, but it sounds like you are quite new to 
not just this analysis, but to R as a whole since indexing things 
(vectors, lists, matrices) is something very basic that you need to 
master before being conversant with the language.

If this is the case, I'd strongly recommend you spend some time reading 
up on introductory R stuff (R comes with "an introduction to R") for 
some time before trying to do something any more advanced.

Ensuring that you do so will not only mitigate the chances of you 
shooting yourself in the foot by doing something silly, but it will also 
allow you to get better (and more considered) help here since you will 
be able to ask the type of questions that will leverage the expertise 
from the people subscribed to this list.

For instance, if you have questions regarding fundamental "R 
programming" type of things (indexing a matrix, for example), you should 
direct those to R-help, which you can subscribe to here:

https://stat.ethz.ch/mailman/listinfo/r-help

HTH,
-steve

--
Steve Lianoglou
Computational Biologist
Genentech