[BioC] Reading MAGE-ML cdf into bioconductor for limma in R v. 3.0.2

James W. MacDonald jmacdon at uw.edu
Mon Mar 31 16:13:44 CEST 2014


Hi Ben,

 > pro.fe.set<-ArrayExpress("E-GEOD-26533")
<snip>
 > ls()
[1] "mapCdfName" "pro.fe.set"
 > pro.fe.set
AffyBatch object
size of arrays=448x448 features (51 kb)
cdf=MD4-9313a520062 (??? affyids)
number of samples=39
Error in getCdfInfo(object) :
   Could not obtain CDF environment, problems encountered:
Specified environment does not contain MD4-9313a520062
Library - package md49313a520062cdf not installed
Bioconductor - md49313a520062cdf not available
In addition: Warning message:
missing cdf environment! in show(AffyBatch)

<starts browser>
Googles MD4-9313a520062

Fourth hit is

http://lifesciencedb.jp/geo-e/?division=Unassigned&technology=GeneChip&order=manufacturer&action=ListPlatform

First line in table has link

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL5471

At bottom of said link is

*Supplementary file* 	*Size* 	*Download* 	*File type/resource*
GPL5471.cdf.gz 	1.7 Mb 	(ftp) 
<ftp://ftp.ncbi.nlm.nih.gov/geo/platforms/GPL5nnn/GPL5471/suppl/GPL5471%2Ecdf%2Egz>(http) 
<http://www.ncbi.nlm.nih.gov/geo/download/?acc=GPL5471&format=file&file=GPL5471%2Ecdf%2Egz> 
	CDF


Copies http link

</closes browser>

 > 
download.file("http://www.ncbi.nlm.nih.gov/geo/download/?acc=GPL5471&format=file&file=GPL5471%2Ecdf%2Egz", 
"tmp.gz")
trying URL 
'http://www.ncbi.nlm.nih.gov/geo/download/?acc=GPL5471&format=file&file=GPL5471%2Ecdf%2Egz'
Content type 'application/octet-stream' length 1764017 bytes (1.7 Mb)
opened URL
downloaded 1.7 Mb
 > library(makecdfenv)
 > make.cdf.package("GPL5471.cdf.gz", "md49313a520062cdf", compress = 
TRUE, species = "Some_bacterium")
## this may fail. In which case
gzip -d GPL5471.cdf.gz
and then
 > make.cdf.package("GPL5471.cdf", "md49313a520062cdf", species = 
"Some_bacterium")
 > install.packages("md49313a520062cdf/", repos = NULL, type = "source")
 > pro.fe.set
AffyBatch object
size of arrays=448x448 features (51 kb)
cdf=MD4-9313a520062 (9947 affyids)
number of samples=39
number of genes=9947
annotation=md49313a520062
notes=E-GEOD-26533
         E-GEOD-26533
         c("Organism", "treatment", "strain", "time", "", "", "", "", 
"", "", "")
         c("", "", "", "", "", "", "", "", "", "", "")


Best,

Jim





On 3/31/2014 3:43 AM, Ben Temperton [guest] wrote:
> Hi there,
>
> I am trying to load some microarray data from ArrayExpress into R for analysis with Limma:
>
> pro.fe.set<-ArrayExpress("E-GEOD-26533")
>
> However, the probe set needs to be installed first for this to work, and the probe set is in MAGEML format. Previously, I've only ever dealt with the makecdfenv package that uses .cdf files. I found a package called RMAGEML in bioconductor that looked like it would do the job, but it is not available with R v. 3.0.2.
>
> I was hoping you might have some insight into how best to approach this problem.
>
> Many thanks,
> Ben
>
>
>   -- output of sessionInfo():
>
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-redhat-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
>   [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] ArrayExpress_1.22.0 Biobase_2.22.0      BiocGenerics_0.8.0  R.utils_1.29.8      R.oo_1.18.0         R.methodsS3_1.6.1
>
> loaded via a namespace (and not attached):
> [1] affy_1.40.0           affyio_1.30.0         BiocInstaller_1.12.0  limma_3.18.13         preprocessCore_1.24.0
> [6] tools_3.0.2           XML_3.98-1.1          zlibbioc_1.8.0
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list