[R] Efficient way of loading files in R

Deepa deep@m@hm@||@c @end|ng |rom gm@||@com
Fri Sep 7 12:35:24 CEST 2018


I already posted a similar issue on bioconductor.
https://support.bioconductor.org/p/112607/#112634
Couldn't find a solution.


On Fri, Sep 7, 2018 at 3:45 PM Martin Morgan <mtmorgan.bioc using gmail.com>
wrote:

> Ask on the Bioconductor support site https://support.bioconductor.org
>
> Provide (on the support site) the output of the R commands
>
>    library(GEOquery)
>    sessionInfo()
>
> Also include (copy and paste) the output of the command that fails. I have
>
>  > gseEset2 <- getGEO('GSE76896')[[1]]
> Found 1 file(s)
> GSE76896_series_matrix.txt.gz
> trying URL
> '
> https://ftp.ncbi.nlm.nih.gov/geo/series/GSE76nnn/GSE76896/matrix/GSE76896_series_matrix.txt.gz
> '
> Content type 'application/x-gzip' length 40561936 bytes (38.7 MB)
> ==================================================
> downloaded 38.7 MB
>
> Parsed with column specification:
> cols(
>    .default = col_double(),
>    ID_REF = col_character()
> )
> See spec(...) for full column specifications.
> |=================================================================| 100%
>    84 MB
> File stored at:
> /tmp/Rtmpe4NWji/GPL570.soft
> |=================================================================| 100%
>    75 MB
>  > sessionInfo()
> R version 3.5.1 Patched (2018-08-22 r75177)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 16.04.5 LTS
>
> Matrix products: default
> BLAS: /home/mtmorgan/bin/R-3-5-branch/lib/libRblas.so
> LAPACK: /home/mtmorgan/bin/R-3-5-branch/lib/libRlapack.so
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
> [1] bindrcpp_0.2.2      GEOquery_2.49.1     Biobase_2.41.2
> [4] BiocGenerics_0.27.1 BiocManager_1.30.2
>
> loaded via a namespace (and not attached):
>   [1] Rcpp_0.12.18     tidyr_0.8.1      crayon_1.3.4     dplyr_0.7.6
>   [5] assertthat_0.2.0 R6_2.2.2         magrittr_1.5     pillar_1.3.0
>   [9] stringi_1.2.4    rlang_0.2.2      curl_3.2         limma_3.37.4
> [13] xml2_1.2.0       tools_3.5.1      readr_1.1.1      glue_1.3.0
> [17] purrr_0.2.5      hms_0.4.2        compiler_3.5.1   pkgconfig_2.0.2
> [21] tidyselect_0.2.4 bindr_0.1.1      tibble_1.4.2
>
> On 09/07/2018 06:08 AM, Deepa wrote:
> > Hello,
> >
> > I am using a bioconductor package in R.
> > The command that I use reads the contents of a file downloaded from a
> > database and creates an expression object.
> >
> > The syntax works perfectly fine when the input size is of 10 MB. Whereas,
> > when the file size is around 40MB the object isn't created.
> >
> > Is there an efficient way of loading a large input file to create the
> > expression object?
> >
> > This is my code,
> >
> >
> > library(gcrma)
> > library(limma)
> > library(biomaRt)
> > library(GEOquery)
> > library(Biobase)
> > require(GEOquery)
> > require(Biobase)
> > gseEset1 <- getGEO('GSE53454')[[1]] #filesize 10MB
> > gseEset2 <- getGEO('GSE76896')[[1]] #file size 40MB
> >
> > ##gseEset2 doesn't load and isn't created
> >
> > Many thanks
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>

	[[alternative HTML version deleted]]




More information about the R-help mailing list