[BioC] EdgeR: sRNA analysis check

Kenlee Nakasugi kenlee.nakasugi at sydney.edu.au
Tue Aug 20 09:49:02 CEST 2013


Hi, 

I am hoping someone can comment on whether what I have done is correct
as I do not have anyone here to confide in and am relatively new to
this. 

Experiment: 
This is a sRNA deep sequencing experiment. 
I have two samples and two controls. The samples are a cellular sub
compartment of the controls, and the controls are whole cell extract
which also contain sequences from the sub compartment. 
The goal is to see whether there have been some transfer of sRNAs from
the total extract into the sub-compartment. 
Due to the nature of the sample prep, and there will inevitably be some
contamination of total extract sequences in the sub-compartment
sequences. There are also some sRNA sequences that are natively present
in the samples and are highly concentrated relative to the total extract
(which also contain them). The focus is *not* on these sequences. The
intention is to normalise first against the abundance of these sequences
(create a 'base-level abundance'), and then input the new abundances
into edgeR to identify sRNA sequences that have been enriched in the
samples (i.e. have been transferred rather than contamination). 

Analysis design:
The workflow is basically following: http://cgrlucb.wikispaces.com/edgeR
+spring2013

1. Get the total counts of sample specific sRNA sequences in all samples
and controls. 
2. Normalise raw counts against the counts from step 1 for each
sample/control 
3. Run edgeR as described in http://cgrlucb.wikispaces.com/edgeR
+spring2013  (The R code is at the end of this email). 

I am attaching some plots I generated. The BCV plot looks weird to me,
but I'm not sure how to interpret. 

I am worried that I have done something wrong by first normalising
against those sample specific sRNA sequences (getting the 'normCounts'
variable below)

Any advice greatly appreciated!! 

Cheers,
Ken



Here is the R code:

#################

> sampleInfo=read.delim("sampleInfo", header =T, sep="\t")
> group = sampleInfo[,2]
> group
[1] sample sample total total   
Levels: sample total

> rawcounts <- read.delim("abun.tab", row.names="Sequence")
> head(rawcounts)
                             S1 S2 C1 C2
AAAAAAAAAATAGCCGTTGGACCC       1   2   1     3
AAAAAAAAAGAACCTAATGGATCGAACT  15  40   8     9
AAAAAAAAATGGCCGTTGGGAACC       1   1   2     2
AAAAAAAACCTGAATAACCCGAAA       5   8 129    34
AAAAAAAAGAACCTAATGGATCGAACT   44  86  15    23
AAAAAAAATCTGAATAACCCGAAA       1   1  38     4

> housekeep <- c(7966975,7644095,715816,724099)    ## calculated these
abundances outside of R
> normCounts <- round(t(t(rawcounts)/housekeep)*mean(housekeep))  ## Not
sure if this is right thing to do

> filter <- means >=10
> normCountsfilt <- normCounts[filter,]

> cds2 <- DGEList(normCountsfilt, group=group)
> cds2 = calcNormFactors(cds2)
> cds2 = estimateCommonDisp(cds2, verbose=T)
Disp = 0.52689 , BCV = 0.7259 
> cds2 = estimateTagwiseDisp(cds2)

#BCVplot
> plotBCV(cds2)
#MeanVariance Plot
> meanVarPlot <- plotMeanVar(cds2, show.raw.vars=TRUE,
show.tagwise.vars=TRUE, show.binned.common.disp.vars=FALSE,
show.ave.raw.vars=FALSE, NBline = TRUE , nbins = 100, pch = 16 , xlab
="Mean Expression (Log10 Scale)" , ylab = "VaPlot" )

> et <- exactTest(cds2, pair=levels(group))
> top <- topTags(et, n=nrow(cds2$counts))$table
> de <- rownames(top[top$FDR<0.05,])

#SmearPlot
> plotSmear(cds2 , de.tags=de)
#VolcanoPlot
> plot(top$logFC, -log10(top$PValue), pch=20, cex=.5,
ylab="-log10(p-value)", xlab="logFC", col=as.numeric(rownames(top) %in%
de)+1)

#############################

-------------- next part --------------
A non-text attachment was scrubbed...
Name: MeanVariancePlotfilterlowreads
Type: image/png
Size: 33391 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20130820/4e2e0d70/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: plotBCVfilterlowreads
Type: image/png
Size: 18635 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20130820/4e2e0d70/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SmearPlotfilterlowreads
Type: image/png
Size: 56590 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20130820/4e2e0d70/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: VolcanoPlotfilterlowreads
Type: image/png
Size: 22130 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20130820/4e2e0d70/attachment-0003.png>


More information about the Bioconductor mailing list