[BioC] zero rna-seq values AFTER normalisation in edgeR

James W. MacDonald jmacdon at uw.edu
Fri Aug 15 19:46:09 CEST 2014


Hi Nick,



On Fri, Aug 15, 2014 at 9:23 AM, Nick N <feralmedic at gmail.com> wrote:

> I am using edgeR to analyze RNA-Seq data. This is my script:
>
>
> library("edgeR")
> #############################
> #read in metadata & DGE
> #############################
> composite_samples <- read.csv(file="samples.csv",header=TRUE,sep=",")
> counts <- readDGE(composite_samples$CountFiles)$counts
> #############################
> #Filter & Library Size Re-set
> #############################
> noint <- rownames(counts) %in% (c("no_feature", "ambiguous",
> "too_low_aQual", "not_aligned", "alignment_not_unique"))
> cpms <- cpm(counts)
> keep <- rowSums(cpms>1)>=3 & !noint
> counts <- counts[keep,]
> colnames(counts) <- composite_samples$SampleName
> d <- DGEList(counts=counts, group=composite_samples$Condition)
> d$samples$lib.size <- colSums(d$counts)
> #############################
> #Normalisation
> #############################
> d <- calcNormFactors(d)
> #############################
> #Recording the normalized counts
> #############################
> all_cpm=cpm(d, normalized.lib.size=TRUE)
> all_counts <- cbind(rownames(all_cpm), all_cpm)
> colnames(all_counts)[1] <- "Ensembl.Gene.ID"
> rownames(all_counts) <- NULL
> #############################
> #Estimate Dispersion
> #############################
> d <- estimateCommonDisp(d)
> d <- estimateTagwiseDisp(d)
> #############################
> #Perform a test
> #############################
> de_ctl_mo_composite <- exactTest(d, pair=c("NY", "N"))
>
>
> I believe that the variable "all_counts" shall contain the normalized
> counts for each sample in each condition.


This is a misunderstanding. The counts are not affected by the
normalization. Instead, the only thing that is affected is the norm.factors
column in the 'sample' list item of your DGEList. This is clearly explained
in the edgeR User's guide, on p. 12, under section 2.6.6.

Best,

Jim



My understanding is also that
> edgeR adds pseudocounts BEFORE performing the library normalisation. Thus
> it is possible that some values revert to being zero after normalisation.
> But I thought that this would happen rarely. Yet in a recent dataset I find
> an improbably large number of zero values in "all_counts" which made me
> think that my understanding of how pseudocounts and normalisation work in
> edgeR might be incorrect. Can, please, somebody comment on this?
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list