[BioC] Combining differential gene expression on 2 reference transcriptomes: EdgeR analysis

Mon May 13 15:25:28 CEST 2013

Hi,

I have assembled 2 reference transcriptomes of the same species using
a) genome-guided assembler on an incomplete draft-genome
and 
b) genome-independent assembler.

Q.1) Since, each assembly has it's own limitations making it difficult to combine the datasets, so I would like to know your suggestions for the two strategies:

a) doing each differential expression analysis independently (on full-length transcripts and predicted ORFs) and combining the results only after identification of genes.

or 

b) Take the genome-guided assembly, add missing data from the genome independent assembly and do mapping, read counting and differential expression analysis on predicted ORFs from this one assembly.

Q.2) I used the eXpress package for counting reads, and this reports the raw counts as well as effective counts (after correction for distribution biases). Since edgeR recommends using raw counts, I have used these and obtained expected results for genes that pass a min. cpm cutoff. 
However, eXpress developers recommend the use of rounded effective counts over raw counts even for edgeR. From what I see the maximum difference I would see would be in the removal of lowly expressed genes from the dataset and large-biases in genes with very high no. of mapped reads (which is a problem in my dataset).
It would be informative to have some input from the developers on the issue of these biases and on using the normalised count value.

Thanks
Eshita Sharma

---------------------------
Graduate Student
Max Planck Institute for Developmental Biology