[R] How to more efficently read in a big matrix

affy snp affysnp at gmail.com
Sat Nov 10 07:26:40 CET 2007


Thanks Gabor.

I made the column names look like as:

probeset
WM806SignalA
WM806call
WM1716SignalA
WM1716call
....

And I then tried what you mentioned and got:


> library(sqldf)
Loading required package: gsubfn
Loading required package: proto
> source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R")
> myfile <- file("243_47mel_withnormal_expression_log2.txt")
> stmt <- read.table(myfile, nr = 1, as.is = TRUE)
> stmt <- stmt[regexpr("call", stmt) < 0]
> stmt <- paste("select", paste(stmt, collapse = ","), "from myfile")
> DF <- sqldf(stmt, file.format = list(sep = " "))
Error in summary.connection(get(fo, envir)) : invalid connection
>

How should I correct this?

Thanks!

Allen

On Nov 10, 2007 1:11 AM, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> On Nov 10, 2007 12:25 AM, affy snp <affysnp at gmail.com> wrote:
> > Hi Gabor,
> >
> > Thanks a lot!
> >
> > The header of the big file looks like as follows:
> >
> > probe_set
> > WM_806_Signal_A
> > WM_806_call
> > WM_1716_Signal_A
> > WM_1716_call
> > ....
> >
> > I only need those columns with the header as like _Signal_A
> >
> > Can you suggest how to use sqldf?
> >
>
> sqlite requires that a single character separate the fields.  Use sed or
> other method to reduce multiple spaces to one space in the input
> file and then try something like this:
>
> library(sqldf)
> source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R")
>
> # get headings
> myfile <- file("myfile.dat")
> stmt <- read.table(myfile, nr = 1, as.is = TRUE)
>
> # assume any column with call in its name is to be eliminated
> # and form select statement
> stmt <- stmt[regexpr("call", stmt) < 0]
> stmt <- paste("select", paste(stmt, collapse = ","), "from myfile")
>
> # run it
> myfile <- file("myfile.dat")
> DF <- sqldf(stmt, file.format = list(sep = " "))
>



More information about the R-help mailing list