[BioC] getting normalized expression values from GEO GSE files

Maria Kesa maria.kesa at gmail.com
Wed Aug 27 22:18:34 CEST 2014


Hello:-),

My name is Maria and my goal is to get normalized gene expression values
from this study http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE3398

I installed GEOQuery and it's dependencies RCurl and XML library.

I have two questions:
1. How do I resolve the error that is posted below, when I try to
use gse3398<-getGEO('GSE3398',GSEMatrix=TRUE) ? (I tried installing and
reinstalling RCurl and GEOQuery)
2. How should I normalize the data, considering that there are multiple
platforms in the experiment?
3. If point 1. can not be made to work, I found that it is possible to load
the files manually using the links like (Replacing GPL2648 with the
different platforms in the series)
ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE3398/GSE3398-GPL2648_series_matrix.txt.gz.
My question is how do I process these files and put them into an eset in R?
As I ask in question 2, how do I get the normalized gene expression values
out of the data and get the gene names?

Your help would be much appreciated! The error message that I get and the
sessionInfo is below.

> gse3398<-getGEO('GSE3398',GSEMatrix=TRUE)Found 7 file(s)GSE3398-GPL2648_series_matrix.txt.gzsh: 1: curl: not foundError in file(con, "r") : cannot open the connectionIn addition: Warning messages:1: In download.file(sprintf("ftp://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s",  :
  download had nonzero exit status2: In file(con, "r") :
  cannot open file
'/tmp/RtmppUAQIH/GSE3398-GPL2648_series_matrix.txt.gz': No such file
or directory


> sessionInfo()R version 3.1.1 (2014-07-10)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8
 [2] LC_NUMERIC=C
 [3] LC_TIME=et_EE.UTF-8
 [4] LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=et_EE.UTF-8
 [6] LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=et_EE.UTF-8
 [8] LC_NAME=C
 [9] LC_ADDRESS=C
[10] LC_TELEPHONE=C
[11] LC_MEASUREMENT=et_EE.UTF-8
[12] LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils
[6] datasets  methods   base

other attached packages:
[1] GEOquery_2.28.0    Biobase_2.22.0
[3] BiocGenerics_0.8.0 RCurl_1.95-4.3
[5] bitops_1.0-6

loaded via a namespace (and not attached):
[1] tools_3.1.1  XML_3.98-1.1


Thank you,

Maria

	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list