[BioC] Read big file!

Laurent Gautier laurent at cbs.dtu.dk
Wed Aug 12 18:50:16 CEST 2009


Having a file with 1,200,000 columns, read.table will like take forever
(and this no matter the options) you give, and so will readLines().

Having the same file transposed (1,000 columns / 1,200,000 rows) would
not be such a problem to read in.

Using scan() (and then add a dimension to your vector to make it a
matrix again), or the system-call using cut given earlier (looping
across columns) in this thread could be the simplest way.


L.



> Dear  All,
> I think it would be better to use SAS to trait  the large file.
> Sean: i tried colClasse option, but the computer fail to compile
> Thank you for your help
> M
> 
> 
> Vincent Carey a écrit :
> > it is also possible to use a buffered reading/filtering approach.
> > look carefully at scan(), readLines()
> > and friends.
> >
> > On Wed, Aug 12, 2009 at 11:14 AM, Sean Davis<seandavi at gmail.com> wrote:
> >   
> >> On Wed, Aug 12, 2009 at 10:28 AM, Mohamed
> >> Lajnef<Mohamed.lajnef at inserm.fr> wrote:
> >>     
> >>> Dear R-Users
> >>>
> >>> i would like to read a big file (1000 lines and 1200000 columns) with R? but
> >>> this is impossible ! Does someone have a magic solution to my problem?
> >>>
> >>> Otherwise I try a function to read by few columns this large file ( no by
> >>> lines!)?
> >>>       
> >> This is better posted to the R-help list, but have you tried playing
> >> with colClasses from read.table()?
> >>
> >> Sean
> >>
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor at stat.math.ethz.ch
> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> >>     
> >
> >
> >
> >   
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list