[BioC] DESeq - read counts

Wolfgang Huber whuber at embl.de
Thu Nov 10 23:38:56 CET 2011


Dear Avinash

if your RPKM values were obtained from

    N / (L x Ntot X 10^-6)

then your below formula is almost correct (use 10^(-6) instead of 10-9). 
However - why can you not use the N values from the alignments directly, 
rather than going back and forth over RPKMs?

I am not sure what a "statistically valid RPKM value" is or how that is 
expressed by "P-value < 0.05", and would ignore that part.

For good ways of creating a count table from an alignment (BAM file), 
have a look at the vignette "Counting with summarizeOverlaps" of the 
package GenomicRanges; or at the vignette of the pasilla package.

	Best wishes
	Wolfgang

Nov/10/11 4:39 PM, Avinash S scripsit::
> Dear Bioconductor Members,
>
>   I must start by saying that I'm just starting on analyzing the RNA-Seq data. I have RPKM values and their corresponding P-values for each gene.
>
> I wanted to know if I'm correct using
> " N= RPKM x L x Ntot X 10-9
>> where N = number of mapping reads at a given gene locus, L = estimated length (bp) of the gene locus, Ntot = number of total mapping reads, and RPKM = gene locus RPKM value "
> to convert RPKM values into read counts. The read counts thus calculated using above formula can be used as input for DESeq? Do you suggest considering only statistically valid RPKM values (P-value<  0.05) for differential expression analysis using DESeq?
>
>
> Thank you,
> Avinash
> 	[[alternative HTML version deleted]]
>
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 


Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber



More information about the Bioconductor mailing list