[BioC] getting normalized expression values from GEO GSE files

James W. MacDonald jmacdon at uw.edu
Wed Aug 27 23:21:57 CEST 2014


Hi Maria,

Sometimes with online resources, there are momentary hiccups. I can
currently download that dataset:

> gse3398<-getGEO('GSE3398')
ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE3nnn/GSE3398/matrix/
Found 7 file(s)
GSE3398-GPL2648_series_matrix.txt.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time
 Current
                                 Dload  Upload   Total   Spent    Left
 Speed
100  153k  100  153k    0     0   100k      0  0:00:01  0:00:01 --:--:--
 100k
File stored at:
/data3/tmp/RtmpOwnhbS/GPL2648.soft
GSE3398-GPL2778_series_matrix.txt.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time
 Current
                                 Dload  Upload   Total   Spent    Left
 Speed
100  206k  100  206k    0     0   133k      0  0:00:01  0:00:01 --:--:--
 133k
File stored at:
/data3/tmp/RtmpOwnhbS/GPL2778.soft
GSE3398-GPL2832_series_matrix.txt.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time
 Current
                                 Dload  Upload   Total   Spent    Left
 Speed
100 1060k  100 1060k    0     0   593k      0  0:00:01  0:00:01 --:--:--
 593k
File stored at:
/data3/tmp/RtmpOwnhbS/GPL2832.soft
GSE3398-GPL2868_series_matrix.txt.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time
 Current
                                 Dload  Upload   Total   Spent    Left
 Speed
100  253k  100  253k    0     0   167k      0  0:00:01  0:00:01 --:--:--
 167k
File stored at:
/data3/tmp/RtmpOwnhbS/GPL2868.soft
GSE3398-GPL2904_series_matrix.txt.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time
 Current
                                 Dload  Upload   Total   Spent    Left
 Speed
100  196k  100  196k    0     0   129k      0  0:00:01  0:00:01 --:--:--
 129k
File stored at:
/data3/tmp/RtmpOwnhbS/GPL2904.soft
GSE3398-GPL2905_series_matrix.txt.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time
 Current
                                 Dload  Upload   Total   Spent    Left
 Speed
100 1995k  100 1995k    0     0  1034k      0  0:00:01  0:00:01 --:--:--
1034k
File stored at:
/data3/tmp/RtmpOwnhbS/GPL2905.soft
GSE3398-GPL2906_series_matrix.txt.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time
 Current
                                 Dload  Upload   Total   Spent    Left
 Speed
100  104k  100  104k    0     0  66850      0  0:00:01  0:00:01 --:--:--
66867
File stored at:
/data3/tmp/RtmpOwnhbS/GPL2906.soft

As for point 2, I can't really help you with that one, as I know nothing
about this experiment other than the cursory glance I just made at the GEO
site. You might consider the GeneMeta package (
http://www.bioconductor.org/packages/release/bioc/html/GeneMeta.html),
which is intended for the analysis of data from various sources.

Best,

Jim




On Wed, Aug 27, 2014 at 4:18 PM, Maria Kesa <maria.kesa at gmail.com> wrote:

> Hello:-),
>
> My name is Maria and my goal is to get normalized gene expression values
> from this study http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE3398
>
> I installed GEOQuery and it's dependencies RCurl and XML library.
>
> I have two questions:
> 1. How do I resolve the error that is posted below, when I try to
> use gse3398<-getGEO('GSE3398',GSEMatrix=TRUE) ? (I tried installing and
> reinstalling RCurl and GEOQuery)
> 2. How should I normalize the data, considering that there are multiple
> platforms in the experiment?
> 3. If point 1. can not be made to work, I found that it is possible to load
> the files manually using the links like (Replacing GPL2648 with the
> different platforms in the series)
>
> ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE3398/GSE3398-GPL2648_series_matrix.txt.gz
> .
> My question is how do I process these files and put them into an eset in R?
> As I ask in question 2, how do I get the normalized gene expression values
> out of the data and get the gene names?
>
> Your help would be much appreciated! The error message that I get and the
> sessionInfo is below.
>
> > gse3398<-getGEO('GSE3398',GSEMatrix=TRUE)Found 7
> file(s)GSE3398-GPL2648_series_matrix.txt.gzsh: 1: curl: not foundError in
> file(con, "r") : cannot open the connectionIn addition: Warning messages:1:
> In download.file(sprintf("
> ftp://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s",  :
>   download had nonzero exit status2: In file(con, "r") :
>   cannot open file
> '/tmp/RtmppUAQIH/GSE3398-GPL2648_series_matrix.txt.gz': No such file
> or directory
>
>
> > sessionInfo()R version 3.1.1 (2014-07-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8
>  [2] LC_NUMERIC=C
>  [3] LC_TIME=et_EE.UTF-8
>  [4] LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=et_EE.UTF-8
>  [6] LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=et_EE.UTF-8
>  [8] LC_NAME=C
>  [9] LC_ADDRESS=C
> [10] LC_TELEPHONE=C
> [11] LC_MEASUREMENT=et_EE.UTF-8
> [12] LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils
> [6] datasets  methods   base
>
> other attached packages:
> [1] GEOquery_2.28.0    Biobase_2.22.0
> [3] BiocGenerics_0.8.0 RCurl_1.95-4.3
> [5] bitops_1.0-6
>
> loaded via a namespace (and not attached):
> [1] tools_3.1.1  XML_3.98-1.1
>
>
> Thank you,
>
> Maria
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list