[BioC] EdgeR norm.factor input

Mon Feb 10 20:06:31 CET 2014

Dear Gordon,

Thank you so much for your comments.

One more question about the first question asked in my previous post where I asked about how to supply the correct factor in the normalization step. 

I would like use the total read count normalization method to normalize the data then use the edgeR to test my multi-factor models as in my previous post. The total read count normalization is given as 

X_ij/(N_j/mean(N))=X_ij*mean(N)/N_j, 

where X_ij is the read count of gene i sample j, N_j is the library size of sample j, and mean(N) is the mean of library sizes over all samples. My question is what is the input for y$samples$norm.factors? Can I do as the following:
y$samples$norm.factors = N/mean(N)? Where N is the vector of library size of all samples, and mean(N) is the mean of library sizes over all sample. Or could you please give me some suggestion? 
Thank you!

Yanzhu

---------------------------------------------------

 Date: Fri,  7 Feb 2014 07:25:17 -0800 (PST)
> From: "Yanzhu [guest]" <guest at bioconductor.org>
> To: bioconductor at r-project.org, mlinyzh at gmail.com
> Subject: [BioC] EdgeR multi-factor testing questions
>
>
> Dear Gordon,
>
> Thank you so much for your comments. I have updated my code and get the 
> different results for TMM and Upper quartile normalization methods.
>
> I have two more question regarding the normalization issue. I have tried 
> different normalization methods and would like to compare their 
> performance. My questions are:
>
> 1. In the users' guide 2.5.6, it mentions that normalization takes the 
> form of correction factors that enter into the statistical model. Such 
> correction factors are usually computed internally by edgeR functions, 
> but it is also possible for a user to supply them.I would like to supply 
> the correct factor to edgeR, how could I do this?

Just enter in your own values:

  y$samples$norm.factors <- yourvalues

> 2. I also would like to compare the testing results of normalized data 
> with the results of raw data (without normalizing the data)? Could I 
> just skip the the normalization step as below?

Yes.

Gordon

> group<-paste(L,S,R,sep=".")
> design<-model.matrix(~L+R+S+L:R+L:S+R:S+L:R:S)
> y<-DGEList(counts=counts,group=group)
> #y<-calcNormFactors(y,method="upperquartile",p=0.75) ##skip this step
>
> y<-estimateGLMCommonDisp(y,design)
> y<-estimateGLMTagwiseDisp(y,design)
>
> fiteUQ_LRS<-glmFit(y,design,offset=offset  )
>
> Thanks.
>
>
> Yanzhu
>
>

 -- output of sessionInfo(): 

>  sessionInfo() 
R version 3.0.1 (2013-05-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] DESeq_1.12.1       lattice_0.20-15    locfit_1.5-9.1     Biobase_2.20.1     BiocGenerics_0.6.0 edgeR_3.2.4        limma_3.16.8      

loaded via a namespace (and not attached):
 [1] annotate_1.38.0      AnnotationDbi_1.22.6 DBI_0.2-7            genefilter_1.42.0    geneplotter_1.38.0   grid_3.0.1           IRanges_1.18.4      
 [8] RColorBrewer_1.0-5   RSQLite_0.11.4       splines_3.0.1        stats4_3.0.1         survival_2.37-4      XML_3.98-1.1         xtable_1.7-1        
>

--
Sent via the guest posting facility at bioconductor.org.