[BioC] Gene filtering for RNA-seq data

FeiYian Yoong [guest] guest at bioconductor.org
Sun Nov 17 01:42:39 CET 2013


I am writing to inquire about independent filtering for my large RNA-seq dataset. I have around 55,000 genes (raw gene counts) RNA sequencing data from 91 libraries/samples, consisting of 3 biological replicates for 4 different genotypes on germinating seeds. I am currently working on differential expression and subsequently transcriptomic network analysis for these samples. Before performing any of these analyses, I'd like to perform an independent filtering for my data to increase detection power for differentially expressed genes. I will be using your DESeq2 package (version 1.2.5) for my filtering and differential expression analysis.

Based on recommendation by a statistician, I have decided to perform the following steps:

1) Fit a negative binomial GLM with genotype & time effects across all samples for all genes that have nonzero counts in at least one sample
2) Filter weakly expressed genes (for example using a filter like the one implemented in HTSFilter)
3) Adjust p-values for genes passing the filter to correct for multiple testing


While the DESeq2 package was nicely written, since I am not a statistician, I am still a little bit unclear on a few things. Hence, I would like to clarify a few things with you, mainly the workflow for my analysis. Based on my understanding from what's written in DESeq2 package, I should be doing the following (in chronological order):


1. First, perform a differential expression (dds function) on my raw gene counts for library size normalization. This step will fit my data to a negative binomial generalized linear model with genotypes & time effects across all samples for all genes that have nonzero counts in at least one sample.

2. Second, use the result I obtain from step 1 to go through independent filtering step using filter_p function from genefilter package.

3. Third, use the result from step 2 to filter weakly expressed genes further more using HTSFilter package.

4. Finally, adjust p-values for genes passing the filter to correct for multiple testing. I am not entirely sure how to do this. Can I perform this step using DESeq2 package?


Furthermore, does DESeq2 take care of PCR duplicate artifacts?

 -- output of sessionInfo(): 

none

--
Sent via the guest posting facility at bioconductor.org.



More information about the Bioconductor mailing list