[BioC] Analysing RNA-Seq data using DESeq package

Wolfgang Huber whuber at embl.de
Wed Sep 21 09:04:59 CEST 2011


Dear Suryavadhan

for normalisation between samples, please use the method described in 
the DESeq vignette, rather than the (information-losing) method 
described below.

For the non-unique reads, DESeq has no provision for fuzzy or fractional 
alignments. You'll have to make a choice, and provide actual counts.

	Hope this helps
	Wolfgang


Sep/15/11 6:58 PM, Kayilai, Suryavadhan (MU-Student) scripsit::
> FOr the 6 sequenced samples, we ran alignments to get expression
> estimates. The protocol is to align the reads, then count the number
> of reads falling within the boundaries of the annotated genes, then
> normalize with respect to the number of reads aligning in each
> sample(not the sample length). The analysis also attempts to capture
> the non uniquely aligning reads by estimating the unique read counts
> for each gene, then apportioning the ambiguously aligning reads among
> the potential sources based on the ratios of read counts among those
> sources established by the less ambiguous readsie the first round of
> apportioning assigns 2-mapped reads based on the unique alignments,
> then 3-mapped reads are apportioned based on the adjusted read
> counts, and so on). So, in the attached you'll see three sets of
> columns for the samples, with those head "unique" giving the
> per-million-reads-aligned normalized values for each samples uniquely
> aligned reads, "apportioned" using the adjusted values as described
> above, and "total" giving the number of reads aligned to the gene
> models without regard to their uniqueness. Note that in all cases, we
> consider only reads mapping to no more than 5 locations. Hence, the
> values that are in non integer forms. Kindly help me through this
>
> Suryavadhan ________________________________________ From: Steve
> Lianoglou [mailinglist.honeypot at gmail.com] Sent: Thursday, September
> 15, 2011 9:42 AM To: Kayilai, Suryavadhan (MU-Student) Cc:
> bioconductor at r-project.org Subject: Re: [BioC] Analysing RNA-Seq data
> using DESeq package
>
> Hi Suryavadhan,
>
> On Tue, Sep 13, 2011 at 12:41 PM, Kayilai, Suryavadhan (MU-Student)
> <skhx5 at mail.missouri.edu>  wrote:
>> I downloaded the DESeq package for the RNA seq analysis of the
>> Soybean genes. The package is really helpful and easy to use.
>> Thanks! I have a small doubt and it would be kind of you, if could
>> help me figure out the same. The package works fine for the gene
>> data with whole number or integer values. How can I run the
>> analysis for decimal data as the class newCountDataset does not
>> allow me to input decimal data. It would be great if you could help
>> me through this.
>
> It doesn't let you put in non-integer data, because the models DESeq
> uses to test for significance assumes count data -- as in, the
> number of reads that align to a given region, which can only ever be
> integers.
>
> What types of data are you trying to put in that are decimal values,
> anyway? What does it represent?
>
> -steve
>
> -- Steve Lianoglou Graduate Student: Computational Systems Biology |
> Memorial Sloan-Kettering Cancer Center | Weill Medical College of
> Cornell University Contact Info:
> http://cbio.mskcc.org/~lianos/contact
>
>
>
>
> _______________________________________________ Bioconductor mailing
> list Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
> archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 


Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber



More information about the Bioconductor mailing list