[BioC] EdgeR norm.factors input

Gordon K Smyth smyth at wehi.EDU.AU
Wed Feb 12 01:33:09 CET 2014


> Yanzhu [guest] guest at bioconductor.org
> Tue Feb 11 15:38:03 CET 2014
>
> Dear Gordon,
>
> Thank you so much for your comments. This is exactly what I did for
> total read count normalization, I used norm.factors = 1 for total
> count (TC) normalization.
>
> Then here comes the question. As I mentioned in my previous post, I
> would like to compare the performance of different normalization
> methods. Besides that, I also would like to compare the results of
> normalized data with the results of raw count (RC) data (without
> taking care of any normalization). According to our previous
> discussion, I skiped the normalization step for RC, but the results
> were the same for TC and RC.

Well of course.  As I told you, edgeR always takes the total count into 
account, and the norm.factors are equal to 1 by default.

> Should I use
>
> norm.factors = 1/lib.size
>
> for RC?

Ignoring the library sizes is obviously crazy, and edgeR does not provide 
you with options to do crazy analyses.

I will not provide advice as to how do an analysis that can never be the 
right thing to do.

> One more question, I have also considered the normalization method
> provided in DESeq package. For this normalization method, what should
> be my input of correct factor (norm.factors)? I have figured out the
> relation between the scaling factor (sizeFactors ) of DESeq package
> and the correct factor (norm.factors) of edgeR which is given as
> below:

Have you read the help page for calcNormFactors?  It explains that the 
DESeq normalization is provided as an option:

   y <- calcNormFactors(y,method="RLE")

Gordon

> lib.size*norm.factors/mean(lib.size*norm.factors)=sizeFactors
>
> Now I know the lib.size and sizeFactors, I try to figure out what the
> norm.factors is for DESeq normalization method. This equation system
> involves n unknown variables with n-1 independent equations. Let X=
> norm.factors=(X1,X2,...,Xn)^T, lib.size=N=(N1,N2,...,Nn) and
> sizeFactors = S=(S1,S2,...,Sn), then
>
> X2=X1*(S2/S1)*(N1/N2)
> .
> .
> .
> Xn=X1*(Sn/S1)*(N1/Nn)
>
> Here * means the regular product. I need one more condition to find
> these unknown variables (X1,X2,...,Xn). Do you happenly know whether
> there is extra requirement that norm.factors needs to satisfy?
>
> Thank you!
>
>
> Yanzhu
>
> ----------------------------------------------------------
>
> edgeR always takes the total read count into account, so
>
>    norm.factors = 1
>
> is equivalent to total read count normalization.
>
> Please read the section on normalization in the edgeR User's Guide.
>
> Best wishes
> Gordon
>
>
> > Date: Mon, 10 Feb 2014 11:06:31 -0800 (PST)
> > From: "Yanzhu [guest]" <guest at bioconductor.org>
> > To: bioconductor at r-project.org, mlinyzh at gmail.com
> > Subject: [BioC] EdgeR norm.factor input
> >
> >
> > Dear Gordon,
> >
> > Thank you so much for your comments.
> >
> > One more question about the first question asked in my previous post
> > where I asked about how to supply the correct factor in the
> > normalization step.
> >
> > I would like use the total read count normalization method to 
normalize
> > the data then use the edgeR to test my multi-factor models as in my
> > previous post. The total read count normalization is given as
> >
> > X_ij/(N_j/mean(N))=X_ij*mean(N)/N_j,
> >
> > where X_ij is the read count of gene i sample j, N_j is the library 
size
> > of sample j, and mean(N) is the mean of library sizes over all 
samples.
> > My question is what is the input for y$samples$norm.factors? Can I do 
as
> > the following: y$samples$norm.factors = N/mean(N)? Where N is the 
vector
> > of library size of all samples, and mean(N) is the mean of library 
sizes
> > over all sample. Or could you please give me some suggestion? Thank 
you!
> >
> >
> >
> > Yanzhu
> >
> > ---------------------------------------------------
> >
> > Date: Fri,  7 Feb 2014 07:25:17 -0800 (PST)
> >> From: "Yanzhu [guest]" <guest at bioconductor.org>
> >> To: bioconductor at r-project.org, mlinyzh at gmail.com
> >> Subject: [BioC] EdgeR multi-factor testing questions
> >>
> >>
> >> Dear Gordon,
> >>
> >> Thank you so much for your comments. I have updated my code and get 
the
> >> different results for TMM and Upper quartile normalization methods.
> >>
> >> I have two more question regarding the normalization issue. I have 
tried
> >> different normalization methods and would like to compare their
> >> performance. My questions are:
> >>
> >> 1. In the users' guide 2.5.6, it mentions that normalization takes 
the
> >> form of correction factors that enter into the statistical model. 
Such
> >> correction factors are usually computed internally by edgeR 
functions,
> >> but it is also possible for a user to supply them.I would like to 
supply
> >> the correct factor to edgeR, how could I do this?
> >
> > Just enter in your own values:
> >
> >  y$samples$norm.factors <- yourvalues
> >
> >> 2. I also would like to compare the testing results of normalized 
data
> >> with the results of raw data (without normalizing the data)? Could I
> >> just skip the the normalization step as below?
> >
> > Yes.
> >
> > Gordon
> >
> >> group<-paste(L,S,R,sep=".")
> >> design<-model.matrix(~L+R+S+L:R+L:S+R:S+L:R:S)
> >> y<-DGEList(counts=counts,group=group)
> >> #y<-calcNormFactors(y,method="upperquartile",p=0.75) ##skip this step
> >>
> >> y<-estimateGLMCommonDisp(y,design)
> >> y<-estimateGLMTagwiseDisp(y,design)
> >>
> >> fiteUQ_LRS<-glmFit(y,design,offset=offset  )
> >>
> >> Thanks.
> >>
> >>
> >> Yanzhu
> >>
> >>
>
>
>  -- output of sessionInfo():
>
> > sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252
> [2] LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list