[BioC] Applying DESeq on RSEM output

dvir.tau at gmail.com dvir.tau at gmail.com
Thu Mar 21 12:48:57 CET 2013


Thanks for your reply Mike.

Since the data I'm analyzing is extremely big I really hope to be able to
use the Level 3 data from TCGA which is in RSEM format.

Do you think that rounding the values should have a significant effect on
DESeq performance ?

I still wonder if that raw_count column is good enough as raw counts, and if
there is another way to get the input required for DESeq from the RSEM
output without downloading the huge sequence files.


Many thanks,
Dvir

-----Original Message-----
From: Michael Love [mailto:michaelisaiahlove at gmail.com] 
Sent: Wednesday, March 20, 2013 3:47 PM
To: dnetanely at tau.ac.il
Cc: bioconductor at r-project.org
Subject: Re: [BioC] Applying DESeq on RSEM output

hi Dvir

I am not familiar with RSEM software, but you have non-integer values in the
raw_count column, for example 31.95 and 258.35.  Non-integer values are not
appropriate for DESeq or edgeR analysis.

Searching for a minute on Google it seems that these raw counts involve
assigning fractions of ambiguously mapped reads (but you should check with
the RSEM developers).  If you don't have access to any lower level data, the
next best option is to round the raw_count values and proceed.

best,

Mike

On Wed, Mar 20, 2013 at 2:15 PM, <dvir.tau at gmail.com> wrote:

> Hello,
>
>
>
> I'm running DESeq and EdgeR on RNA-Seq data that was already processed 
> with RSEM (downloaded from TCGA web site).
>
> Since these methods require the raw read counts I'm using the 
> raw_count column of the RSEM output but I'm not sure this is the right 
> thing to do (is it the actual raw count required ?)
>
>
>
> Here's an example file for the RSEM output file downloaded from TCGA:
>
>
>
>
> barcode
>
> gene_id
>
> raw_count
>
> scaled_estimate
>
> transcript_id
>
>
> TCGA-A1-A0SB-01A-11R-A144-07
>
> ?|100130426
>
> 0
>
> 0
>
> uc011lsn.1
>
>
> TCGA-A1-A0SB-01A-11R-A144-07
>
> ?|100133144
>
> 34.05
>
> 1.23812E-06
>
> uc010unu.1,uc010uoa.1
>
>
> TCGA-A1-A0SB-01A-11R-A144-07
>
> ?|100134869
>
> 31.95
>
> 8.40876E-07
>
> uc002bgz.2,uc002bic.2
>
>
> TCGA-A1-A0SB-01A-11R-A144-07
>
> ?|10357
>
> 258.35
>
> 2.16969E-05
>
> uc010zzl.1
>
>
> TCGA-A1-A0SB-01A-11R-A144-07
>
> ?|10431
>
> 1459
>
> 5.53441E-05
>
> uc001jiu.2,uc010qhg.1
>
>
>
>
>
> Many thanks,
>
> Dvir
>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list