[BioC] differential expression- edgeR

Mark Robinson mark.robinson at imls.uzh.ch
Mon Dec 23 17:35:45 CET 2013


Dear Ashutosh,

I wasn't able to (nicely) get exactly the same files from GEO, but here is an alternative:

See:
http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-71/samples/

f <- "ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/experiment/MTAB/E-MTAB-71/E-MTAB-71.raw.1.zip"
bf <- basename(f)
download.file(f, bf)  # download ZIP from ArrayExpress

unzip(bf)  # should put the 8 TXT files in current dir

tg <- dir(".","^DLCK|WT")

library("edgeR")
counts <- readDGE(tg)$counts
counts[is.na(counts)] <- 0

grp <- sapply(colnames(counts),function(u) strsplit(u,"_")[[1]][1])
d <- DGEList(counts=counts, group=grp)

This should return:

> d
An object of class "DGEList"
$counts
                  DLCK.TG_1 DLCK.TG_2 DLCK.TG_3 DLCK.TG_4 WT_1  WT_3 WT_4 WT_6
AAAAAAAAAAAAAAAAA     22653      3059      1366      6574 7782 35096 6623 9633
AAAAAAAAAAAAAAAAC        82        51        55        93  412   134  335  519
AAAAAAAAAAAAAAAAG         2         3         7         9   59     5   45   84
AAAAAAAAAAAAAAAAT       118       471       359       717 1842    94 2465 3311
AAAAAAAAAAAAAAACA        67         4         4        12   17   108   12   21
844311 more rows ...

$samples
            group lib.size norm.factors
DLCK.TG_1 DLCK.TG   651172            1
DLCK.TG_2 DLCK.TG  2685418            1
DLCK.TG_3 DLCK.TG  3202246            1
DLCK.TG_4 DLCK.TG  2460753            1
WT_1           WT  3142262            1
WT_3           WT   294909            1
WT_4           WT  3517977            1
WT_6           WT  3558260            1

[… proceed from here …]

Alternatively, from your original code, I think you want 'skip=1' and is it possible that you have an extra unwanted file in your 'targets' list ?

Anyways, hope that helps.

Best, Mark


----------
Prof. Dr. Mark Robinson
Bioinformatics, Institute of Molecular Life Sciences
University of Zurich
http://ow.ly/riRea






On 23.12.2013, at 16:50, Ashutosh [guest] <guest at bioconductor.org> wrote:

> 
> I am new to R and edgeR. I am trying to follow to example  
> 9. Case Study: deep-sequenced short tags from the edgeR manual. 
> 
> I download the data from http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM272105
> by clicking the full table and then saving webpage as .txt file containing 
> 
> First 4 rows as below folled by fifth which contains the tag seq and count. 
> 
> #SEQUENCE = 
> #COUNT = 
> #TPM = tags per million
> SEQUENCE	COUNT	TPM
> CATCGCCAGCGGGCACC	1	0.37
> 
> Now we I follow the steps of 9.3 reading data and creating the DGElist: After running 
> < d<- readDGE(targets, skip = 5, comment.char = "#"), I get 
> Error in taglist[[i]] : subscript out of bounds
> 
> Can anyone please help, how I can solve this issue. 
> 
> Best regards,
> Ashutosh
> 
> 
> -- output of sessionInfo(): 
> 
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-pc-linux-gnu (64-bit)
> 
> locale:
> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
> [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
> [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
> [9] LC_ADDRESS=C               LC_TELEPHONE=C            
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base     
> 
> other attached packages:
> [1] edgeR_3.4.2  limma_3.18.7
> 
> 
> --
> Sent via the guest posting facility at bioconductor.org.
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list