[BioC] edgeR reading data

Steve Lianoglou mailinglist.honeypot at gmail.com
Fri Apr 13 19:19:27 CEST 2012


Hi,

On Fri, Apr 13, 2012 at 12:35 PM, Wang, Li <li.wang at ttu.edu> wrote:
> Dear Gordon
>
> Thanks very much for your reply.
> My data are now in txt format. They are separate files, each representing a sample. In each file, I specify two columns, one for gene Name, the other for expression value (total exon reads, no transformation).
> I am thinking of the readDGE function as suggested in the manual. I assume that in the function, each time only one file can be red. Then I did to do readDGE for couple of times.
> And then I donot know how to combine these reads into one table.

If all of the rows are in the same order, I can imagine doing
something simple like:

R> dat <- lapply(file.paths, read.table, ...[[more stuff]])

## This has two columns (gene id and count), you might pick of the
second and cbind

R> cnts <- do.call(cbind, lapply(dat, '[[', 2))

If the rows aren't in the same order, you'll want to keep the gene ids
and counts together (in 2 column data.frames), then use `merge` or
something similar to recursively build an 'uber' count table by keying
on the gene/bin/whatever id's.

> Also I didnot give any information about library size. How could it be computed from the counts?

Once you have a matrix (or data.frame) of counts, isn't this simply a
call to `colSums`?

Alternatively, if you want to use all aligned reads, you can pick that
off easily from the third column in a call to `samtools idxstats
YOURBAMFILE`

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list