[BioC] how to calculate gene length to be used in rpkm() in edgeR

Ryan rct at thompsonclan.org
Sat May 3 00:15:07 CEST 2014


Hi Shirley,

The appropriate gene length to use is whatever gene length was used to 
compute RPKM values for data set B. If you don't have that information, 
then I don't see how you can compute comparable RPKM values for your 
data.

-Ryan

On Fri May  2 15:01:32 2014, shirley zhang wrote:
> Dear List,
>
> I've been used edgeR for differential expression analysis for data
> generated from the same tissue, but different conditions.
>
> Now I have a RNAseq data A (n=20), and would like to compare them with
> another RNAseq data B (n=1,000 across different tissues). Since data B is
> normalized and batch-effect adjusted RPKM value, I need to generate RPKM
> value for my own data A.
>
> I already had a count table, and would like to use rpkm() in edgeR, but
> first I have to get a gene length vector. My question is how to count gene
> length from an "Ensembl.gtf" file by taking into account the following:
>
> 1. Gene 1 is much longer than Gene 2 if including both exon and intron. But
>      Gene 1 only has 3 exons, and Gene 2 has 10 exons --> for the
> transcripts, Gene2>Gene1
>
> 2. For the same Gene, there are > 1 transcript isoforms.  In different
> tissues, different transcript isoforms will be expressed.
>
> Many thanks,
> Shirley
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list