[R] How to more efficently read in a big matrix

Gabor Grothendieck ggrothendieck at gmail.com
Sat Nov 10 07:11:10 CET 2007


On Nov 10, 2007 12:25 AM, affy snp <affysnp at gmail.com> wrote:
> Hi Gabor,
>
> Thanks a lot!
>
> The header of the big file looks like as follows:
>
> probe_set
> WM_806_Signal_A
> WM_806_call
> WM_1716_Signal_A
> WM_1716_call
> ....
>
> I only need those columns with the header as like _Signal_A
>
> Can you suggest how to use sqldf?
>

sqlite requires that a single character separate the fields.  Use sed or
other method to reduce multiple spaces to one space in the input
file and then try something like this:

library(sqldf)
source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R")

# get headings
myfile <- file("myfile.dat")
stmt <- read.table(myfile, nr = 1, as.is = TRUE)

# assume any column with call in its name is to be eliminated
# and form select statement
stmt <- stmt[regexpr("call", stmt) < 0]
stmt <- paste("select", paste(stmt, collapse = ","), "from myfile")

# run it
myfile <- file("myfile.dat")
DF <- sqldf(stmt, file.format = list(sep = " "))



More information about the R-help mailing list