[R] ff package: reading selected columns from csv

threshold r.kozarski at gmail.com
Wed Jul 25 17:48:50 CEST 2012


*Dear R users, Ive just started using the ff package.

There is a csv file (~4Gb) with 7 columns and 6e+7 rows. I want to read only
column from the file, skipping the first 100 rows.
Below Ive provided different outcomes, which will clarify my problem
*
> sessionInfo()
R version 2.14.2 (2012-02-29)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
...

attached base packages:
[1] tools     stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] ff_2.2-7  bit_1.1-8

##---------------------------------------------------------------------------------------
## *I want to read the second column only:*
x.class <- c('NULL', 'numeric','NULL','NULL','NULL', 'NULL', 'NULL')

##* The following command works fine:*

>     read.csv.ffdf(file=csvfile, header=FALSE, skip=100,
> colClasses=x.class, nrows=1e3)
ffdf (all open) dim=c(1000,1), dimorder=c(1,2) row.names=NULL
ffdf virtual mapping
   PhysicalName VirtualVmode PhysicalVmode  AsIs VirtualIsMatrix
V2           V2       double        double FALSE           FALSE
   PhysicalIsMatrix PhysicalElementNo PhysicalFirstCol PhysicalLastCol
V2            FALSE                 1                1               1
   PhysicalIsOpen
V2           TRUE
ffdf data
          V2
1    -0.5412
2    -0.5842
3    -0.5920
4    -0.5451
5    -0.5099
6    -0.5021
7    -0.4943
8    -0.5490
:          :
993  -0.4865
994  -0.6584
995  -0.7482
996  -0.8732
997  -0.8303
998  -0.7248
999  -0.5490
1000 -0.4240

*Then I extend nrows by 1, I get warning about number of columns:*

>     read.csv.ffdf(file=csvfile, header=FALSE, skip=100,
> colClasses=x.class, nrows=1001)
ffdf (all open) dim=c(1001,1), dimorder=c(1,2) row.names=NULL
ffdf virtual mapping
   PhysicalName VirtualVmode PhysicalVmode  AsIs VirtualIsMatrix
V2           V2       double        double FALSE           FALSE
   PhysicalIsMatrix PhysicalElementNo PhysicalFirstCol PhysicalLastCol
V2            FALSE                 1                1               1
   PhysicalIsOpen
V2           TRUE
ffdf data
          V2
1    -0.5412
2    -0.5842
3    -0.5920
4    -0.5451
5    -0.5099
6    -0.5021
7    -0.4943
8    -0.5490
:          :
994  -0.6584
995  -0.7482
996  -0.8732
997  -0.8303
998  -0.7248
999  -0.5490
1000 -0.4240
1001 -0.3849
Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote,  :
  cols = 1 != length(data) = 7
> 

*Then, going much beyond 1000 brings problems:*
>     read.csv.ffdf(file=csvfile, header=FALSE, skip=100,
> colClasses=x.class, nrows=1e4)
Error in read.table(file = file, header = header, sep = sep, quote = quote, 
: 
  more columns than column names

*Question is why? The number of columns does not change in the file...

I will appreciate any help..


Best, Robert

*




--
View this message in context: http://r.789695.n4.nabble.com/ff-package-reading-selected-columns-from-csv-tp4637794.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list