[R] How to more efficently read in a big matrix

Gabor Grothendieck ggrothendieck at gmail.com
Sat Nov 10 07:29:10 CET 2007


You left out the 2nd last line.  Also did you replace multiple spaces
in the input file with one space?

On Nov 10, 2007 1:26 AM, affy snp <affysnp at gmail.com> wrote:
> Thanks Gabor.
>
> I made the column names look like as:
>
> probeset
> WM806SignalA
> WM806call
> WM1716SignalA
> WM1716call
> ....
>
> And I then tried what you mentioned and got:
>
>
> > library(sqldf)
> Loading required package: gsubfn
> Loading required package: proto
> > source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R")
> > myfile <- file("243_47mel_withnormal_expression_log2.txt")
> > stmt <- read.table(myfile, nr = 1, as.is = TRUE)
> > stmt <- stmt[regexpr("call", stmt) < 0]
> > stmt <- paste("select", paste(stmt, collapse = ","), "from myfile")
> > DF <- sqldf(stmt, file.format = list(sep = " "))
> Error in summary.connection(get(fo, envir)) : invalid connection
> >
>
> How should I correct this?
>
> Thanks!
>
> Allen
>
>
> On Nov 10, 2007 1:11 AM, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> > On Nov 10, 2007 12:25 AM, affy snp <affysnp at gmail.com> wrote:
> > > Hi Gabor,
> > >
> > > Thanks a lot!
> > >
> > > The header of the big file looks like as follows:
> > >
> > > probe_set
> > > WM_806_Signal_A
> > > WM_806_call
> > > WM_1716_Signal_A
> > > WM_1716_call
> > > ....
> > >
> > > I only need those columns with the header as like _Signal_A
> > >
> > > Can you suggest how to use sqldf?
> > >
> >
> > sqlite requires that a single character separate the fields.  Use sed or
> > other method to reduce multiple spaces to one space in the input
> > file and then try something like this:
> >
> > library(sqldf)
> > source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R")
> >
> > # get headings
> > myfile <- file("myfile.dat")
> > stmt <- read.table(myfile, nr = 1, as.is = TRUE)
> >
> > # assume any column with call in its name is to be eliminated
> > # and form select statement
> > stmt <- stmt[regexpr("call", stmt) < 0]
> > stmt <- paste("select", paste(stmt, collapse = ","), "from myfile")
> >
> > # run it
> > myfile <- file("myfile.dat")
> > DF <- sqldf(stmt, file.format = list(sep = " "))
> >
>



More information about the R-help mailing list