[BioC] Combining differential gene expression on 2 reference transcriptomes: EdgeR analysis

Wed May 15 03:52:02 CEST 2013

Dear Eshita,

> Date: Mon, 13 May 2013 15:25:28 +0200
> From: Eshita <eshita.sharma at tuebingen.mpg.de>
> To: bioconductor at r-project.org
> Subject: [BioC] Combining differential gene expression on 2 reference
> 	transcriptomes: EdgeR analysis
>
> Hi,
>
> I have assembled 2 reference transcriptomes of the same species using
> a) genome-guided assembler on an incomplete draft-genome
> and
> b) genome-independent assembler.
>
> Q.1) Since, each assembly has it's own limitations making it difficult 
> to combine the datasets, so I would like to know your suggestions for 
> the two strategies:
>
> a) doing each differential expression analysis independently (on 
> full-length transcripts and predicted ORFs) and combining the results 
> only after identification of genes.
>
> or
>
> b) Take the genome-guided assembly, add missing data from the genome 
> independent assembly and do mapping, read counting and differential 
> expression analysis on predicted ORFs from this one assembly.
>
> Q.2) I used the eXpress package for counting reads, and this reports the 
> raw counts as well as effective counts (after correction for 
> distribution biases). Since edgeR recommends using raw counts, I have 
> used these and obtained expected results for genes that pass a min. cpm 
> cutoff. However, eXpress developers recommend the use of rounded 
> effective counts over raw counts even for edgeR.

As you already know, the edgeR developers recommend raw counts, because 
the methodology pre-supposes counts.  I haven't specially evaluated the 
eXpress's recommendation, but the onus is on eXpress to justify this, to 
provide good evidence that it is a good idea to enter non-counts into a 
statistical methods intended for counts.

If you really must use effective counts, then my suggestion would be to 
use voom instead of edgeR because it works fine with fractional counts.

> From what I see the maximum difference I would see would be in the 
> removal of lowly expressed genes from the dataset and large-biases in 
> genes with very high no. of mapped reads (which is a problem in my 
> dataset). It would be informative to have some input from the developers 
> on the issue of these biases and on using the normalised count value.

I am unclear what "biases" you are referring to, and we (edgeR developers) 
have already unambiguously told you that we do not recommend normalized 
counts.

Have you read Section 2.5 of the edgeR User's Guide?

Best wishes
Gordon

> Thanks
> Eshita Sharma
>
> ---------------------------
> Graduate Student
> Max Planck Institute for Developmental Biology
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}