[BioC] Tissue heterogeneity and TMM normalization

Mon Sep 8 23:23:30 CEST 2014

Hello,

I think it is clear from your charts that normalization is a concern. 
Obviously you want to see an MA plot centered at zero. With your data, 
as you have noticed, there appears to be a dependency between M and A. 
There is nothing that TMM or any other scaling normalization method can 
do to eliminate this dependency, since a scaling normalization means 
that there is a single normalization factor for all genes in a sample. 
You might want to investigate the use of the more complex normalization 
procedures offered in the cqn or EDASeq packages. These normalizations 
are variations on quantile normalization, which can remove the trend 
between M and A. However, it is up to you to decide whether this trend 
reflects a technical artifact that should be removed or a real 
biological phenomenon that should be preserved. You can test this by 
verifying that the CEG end up on the zero line of the MA plot after 
normalization.

Lastly, note that having 30% of genes differentially expressed does not 
violate the assumptions of TMM. With the default options, TMM trims the 
top and bottom 30% of ratios, so these differentially expressed genes 
will not disrupt the computation of the normalization factor. The 
assumption being violated is that the assumption of a direct linear 
relationship between RNA abundance and read count for all genes within a 
sample. This is the assumption behind all scaling normalizations.

-Ryan

On Mon 08 Sep 2014 09:15:38 AM PDT, Ni Feng wrote:
>
> Dear all,
> I have a general question about whether TMM normalization is appropriate
> for my data. I apologize for this long winded email. I am not a trained
> bioinformatician and therefore have been struggling with some data
> analysis.
>
> A colleague and I did an RNA seq experiment with 6 samples (each had RNA
> pooled from 6 individuals) and no biological replicates. The 6 samples
> included 2 tissue types collected at 3 different time points. I know that
> this is not an ideal experimental set-up, we did this experiment 3 years
> ago.
>
> We used the Trinity package to do most of the transcriptome assembly and
> downstream analyses, such as leveraging EdgeR for differential expression.
> Naively I went on with all downstream analyses without verifying 
> whether my
> data violated underlying assumptions of TMM normalization.
>
> For example, we found ~30% of our transcripts showed differential
> expression between any 2 pairwise comparisons. Does this violate the TMM
> assumption that most genes are NOT differentially expressed?
>
> Furthermore, we noticed that there is still a tissue bias after
> normalization. Attached is a scatterplot of TMM normalized values for each
> tissue (summed across 3 sample groups for each tissue). Plotted in 
> black on
> top of all transcripts are CEG (Core Eukaryotic Genes) expression, 
> which we
> believe should be good candidates for "house keeping" genes. Both CEGs and
> all genes show that at higher expression levels, there is a skew towards
> one tissue ("VMN"), whereas in the middle values, there is a skew towards
> the other tissue ("H").
>
> I have also attached a density plot of the M values, and a MA plot to
> visualize the skew. These plots were generated from 1 pair of tissue
> comparisons ("SMH" vs "SMV).
>
> These plots reflect the fact that one tissue is more heterogeneous 
> than the
> other. Although TMM normalization is designed to deal with this problem,
> our data seems to need further normalization. Our within tissue 
> comparisons
> are great and do not show this kind of skew. My questions are:
>
> 1) does our data violate TMM normalization assumptions
> 2) do you have another normalization method to suggest for our data
> 3) should we just forget about tissue-comparisons
>
> I have also played around with the suggestions about estimating a
> dispersion value based on the EdgeR user guide. Can discuss this further.
>
> Thank you for your time and patience, and any advice is much appreciated.
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor