[BioC] easyRNAseq question

Wolfgang Huber whuber at embl.de
Sun Jul 8 23:39:54 CEST 2012


Dear Nirmala

thank you. What you call expected is expected only if all null 
hypotheses are true. (If this sentence does not make sense to you, 
please consult a local statistician or a book on hypothesis testing.)

In your case, you have many small p values. One needs to know more about 
the data to tell whether this could make biological sense. If not, then 
you need to explore your data for batch effects or problems with the 
experimental design or data quality.

I posted your plot here: 
http://www-huber.embl.de/users/whuber/bioc-list/120708/EnsemblGTF_qqplot.png
Its axes however do not match what you claim below ("expected is 
calculated using the formula - rank/(n+1)").

PS There are many ways to post an image on the internet, e.g. Facebook, 
Flickr, Imagevenue, Google+, Tumblr and many others. You can pick your 
choice. Alternatively, I am sure that you have an IT department that is 
able to teach you how to best do this.

	Best wishes
	Wolfgang



On 7/5/12 10:16 PM, Akula, Nirmala (NIH/NIMH) [C] wrote:
> Hi,
>
> My analysis pipeline in detail:
>
> 1. Used Tophat 2.0.4 for mapping the reads
> 2. Used Ensemble GTF file for counting using HTSeq
> 3. Then DESeq to find the differentially expressed genes
> 4. The genes are then ranked in the ascending order of p-values. The expected is calculated using the formula - rank/(n+1), where n is the total number of genes. Observed is -log(pvalue). The QQ plot is expected vs observed.
>
> Please let me know if you need additional details.
>
> Sorry, I am not sure what public server you are talking about. I have attached the plot to this e-mail so please post it to the server.
>
> Thank you very much.
>
> Best Regards,
> Nirmala
>
> -----Original Message-----
> From: Wolfgang Huber [mailto:whuber at embl.de]
> Sent: Thursday, July 05, 2012 3:50 AM
> To: bioconductor at r-project.org
> Subject: Re: [BioC] easyRNAseq question
>
> Dear Nirmala
>
> It seems that the attachent did not come through the mailing list server. Can you use a public (picture) server for posting the plot? And provide a reproducible code example.
>
> Also, could you be more clear about what you mean by "the QQ-plot is way above the expected"?
>
> 	Thanks and best wishes
> 	Wolfgang
>
>
> Jul/2/12 11:10 PM, Akula, Nirmala (NIH/NIMH) [C] scripsit::
>> Thank you Simon. I tried Ensemble GTF file with HTSeq and got ~50,000 genes for testing by DESeq. We filtered the genes with low counts and the resulting file had ~23,000 genes. The problem now is the QQ-plot is way above the expected. Please see the attachment.
>>
>> Analysis pipeline: Tophat-HTSeq-DESeq
>>
>> Any suggestions will be greatly helpful.
>>
>> Thank you very  much.
>>
>> Regards,
>> Nirmala
>>
>> -----Original Message-----
>> From: Simon Anders [mailto:anders at embl.de]
>> Sent: Thursday, May 31, 2012 2:31 AM
>> To: bioconductor at r-project.org
>> Subject: Re: [BioC] easyRNAseq question
>>
>> Dear Nirmala
>>
>> On 2012-05-27 02:25, Akula, Nirmala (NIH/NIMH) [C] wrote:
>>> I used HTSeq (similar to your geneModel method) which takes the
>>> counts of disjoint exons for the genes. The problem with this method
>>> is that too many reads are assigned to ambiguous category and
>>> sometimes total number of reads that fall on disjoint exons are too
>>> few to get a valid DESeq result. Using RefSeq genes the total number
>>> of genes counted by HTSeq on my data is ~14000 whereas using the
>>> bestExon method we get ~22000. Do you observe similar counts with your data?
>>
>> It does not quite make sense that counting only for the best exons gives you more counts than counting for all exons.
>>
>> Could it be that the issue with UCSC GTF files described here is the source of your problems:
>>
>> https://stat.ethz.ch/pipermail/bioconductor/2012-April/044717.html
>>
>>      Simon

>
>
> --
> Best wishes
> 	Wolfgang
>
> Wolfgang Huber
> EMBL
> http://www.embl.de/research/units/genome_biology/huber
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Best wishes
	Wolfgang

Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber



More information about the Bioconductor mailing list