[R] skip non-sequential lines using scan?

Gabor Grothendieck ggrothendieck at gmail.com
Thu Nov 8 14:41:36 CET 2007


Don't know if SQLite can handle that many columns but if it can and if file
in an acceptable format then sqldf simplifies the interface to reading it
into an SQLite database that it automatically creates on the fly and then
gets a subset out of it into R.  (If it will fit into memory you can omit the
dname= argument.)

   library(sqldf)
   source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R")

   myfile <- file("myfile.dat")
   sqldf("select * from myfile where rowid % 2 = 0 and rowid >= 5",
dbname = tempfile())

See example 6 on the home page:
http://sqldf.googlecode.com


On Nov 8, 2007 4:19 AM, Matthew Keller <mckellercran at gmail.com> wrote:
> Hi all,
>
> Is there a way to skip non-sequential lines using the "skip" argument
> in the scan function?
>
> E.g., I have a matrix with 100 rows and 1e7 columns. I open a
> connection and want to read only lines 5, 7, 9, etc [i.e.,
> seq(5,99,2)]
>
> It might seem that the syntax to do this would be something like this
> (if only the "skip" allowed vectors in the same way colClasses does in
> read.table):
>
> con <- file("bigfile",open="r")
> rows.I.want <- seq(5,99,2)
> new <- scan(con,what="character",skip=rows.I.want-1,nlines=rows.I.want)
>
> The above doesn't work - it would read lines 5, 6, 7, ...
> length(seq(5,99,2)) rather than 5, 7, 9, ... 99. Yes, I know I can
> accomplish this by looping, but with the huge datasets I'll be working
> with, I'd like to try to save time by doing it all at once. Any ideas?
>
> Matt
>
>
>
> --
> Matthew C Keller
> Asst. Professor of Psychology
> University of Colorado at Boulder
> www.matthewckeller.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list