[R] ff package: reading selected columns from csv

Jan van der Laan rhelp at eoos.dds.nl
Thu Jul 26 09:58:14 CEST 2012


Having had a quick look at the source code for read.table.ffdf, I  
suspect that using 'NULL' in the colClasses argument is not allowed.  
Could you try to see if you can use read.table.ffdf with specifying  
the colClasses for all columns (thereby reading in all columns in the  
file)? If that works, you can be quite sure that indeed that number of  
columns is constant in the file (sometimes a ' or unquoted , can mess  
things up).

Jan




threshold <r.kozarski at gmail.com> schreef:

> *Dear R users, Ive just started using the ff package.
>
> There is a csv file (~4Gb) with 7 columns and 6e+7 rows. I want to read only
> column from the file, skipping the first 100 rows.
> Below Ive provided different outcomes, which will clarify my problem
> *
>> sessionInfo()
> R version 2.14.2 (2012-02-29)
> Platform: x86_64-pc-mingw32/x64 (64-bit)
>
> locale:
> ...
>
> attached base packages:
> [1] tools     stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
> [1] ff_2.2-7  bit_1.1-8
>
> ##---------------------------------------------------------------------------------------
> ## *I want to read the second column only:*
> x.class <- c('NULL', 'numeric','NULL','NULL','NULL', 'NULL', 'NULL')
>
> ##* The following command works fine:*
>
>>     read.csv.ffdf(file=csvfile, header=FALSE, skip=100,
>> colClasses=x.class, nrows=1e3)
> ffdf (all open) dim=c(1000,1), dimorder=c(1,2) row.names=NULL
> ffdf virtual mapping
>    PhysicalName VirtualVmode PhysicalVmode  AsIs VirtualIsMatrix
> V2           V2       double        double FALSE           FALSE
>    PhysicalIsMatrix PhysicalElementNo PhysicalFirstCol PhysicalLastCol
> V2            FALSE                 1                1               1
>    PhysicalIsOpen
> V2           TRUE
> ffdf data
>           V2
> 1    -0.5412
> 2    -0.5842
> 3    -0.5920
> 4    -0.5451
> 5    -0.5099
> 6    -0.5021
> 7    -0.4943
> 8    -0.5490
> :          :
> 993  -0.4865
> 994  -0.6584
> 995  -0.7482
> 996  -0.8732
> 997  -0.8303
> 998  -0.7248
> 999  -0.5490
> 1000 -0.4240
>
> *Then I extend nrows by 1, I get warning about number of columns:*
>
>>     read.csv.ffdf(file=csvfile, header=FALSE, skip=100,
>> colClasses=x.class, nrows=1001)
> ffdf (all open) dim=c(1001,1), dimorder=c(1,2) row.names=NULL
> ffdf virtual mapping
>    PhysicalName VirtualVmode PhysicalVmode  AsIs VirtualIsMatrix
> V2           V2       double        double FALSE           FALSE
>    PhysicalIsMatrix PhysicalElementNo PhysicalFirstCol PhysicalLastCol
> V2            FALSE                 1                1               1
>    PhysicalIsOpen
> V2           TRUE
> ffdf data
>           V2
> 1    -0.5412
> 2    -0.5842
> 3    -0.5920
> 4    -0.5451
> 5    -0.5099
> 6    -0.5021
> 7    -0.4943
> 8    -0.5490
> :          :
> 994  -0.6584
> 995  -0.7482
> 996  -0.8732
> 997  -0.8303
> 998  -0.7248
> 999  -0.5490
> 1000 -0.4240
> 1001 -0.3849
> Warning message:
> In read.table(file = file, header = header, sep = sep, quote = quote,  :
>   cols = 1 != length(data) = 7
>>
>
> *Then, going much beyond 1000 brings problems:*
>>     read.csv.ffdf(file=csvfile, header=FALSE, skip=100,
>> colClasses=x.class, nrows=1e4)
> Error in read.table(file = file, header = header, sep = sep, quote = quote,
> :
>   more columns than column names
>
> *Question is why? The number of columns does not change in the file...
>
> I will appreciate any help..
>
>
> Best, Robert
>
> *
>
>
>
>
> --
> View this message in context:  
> http://r.789695.n4.nabble.com/ff-package-reading-selected-columns-from-csv-tp4637794.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list