[BioC] calcNormFactors in edgeR - quick question from a very inexperienced user

Sun May 12 05:43:42 CEST 2013

Dear Jan,

The first step is to read the documentation!  Page 9 of the edgeR User's 
Guide says:

"If the counts for different samples are stored in separate files, then 
the files have to be read separately and collated together.  The edgeR 
function readDGE is provided to do this.  Files need to contain two 
columns, one for the counts and one for a gene identifier.  See the SAGE 
and deepSAGE case studies for examples of this."

The readDGE() function does exactly what you want to do. Type ?readDGE at 
the R prompt.

Best wishes
Gordon

> Date: Fri, 10 May 2013 16:32:38 +0100
> From: Jan Zaucha <Jan.Zaucha at bristol.ac.uk>
> To: bioconductor at r-project.org
> Subject: [BioC] calcNormFactors in edgeR - quick question from a very
> 	inexperienced user
>
> Hi,
>
> I'm totally new to the field, I've never used R before, but I need to 
> normalize some expression data.
>
> In every file I have many columns corresponding to different samples 
> (different source cells) and rows corresponding to different genes. 
> However I have many different files corresponding to different 
> experiments and they have different total numbers of rows (genes).
>
> I want to use RLE normalization to normalize all of the data, which is 
> implemented in the function calcNormFactors from the package edgeR, but 
> I don't understand how can I put the read counts into a matrix since my 
> files contain different numbers of genes (rows).
>
> I thought I should have a giant matrix containing data from all of my 
> files where the columns are the samples and rows are the genes.
>
> Should I perhaps take the file that has the highest number of genes and 
> input "0" for these genes if they are not present in the other files?
>
> Thanks for your time.
> Jan
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}