[R] How do I paste double quotes arround a character string?

Philip James Smith philipsmith at alumni.albany.edu
Thu Jul 3 15:33:12 CEST 2008


R Community:

At the risk of getting my hands slapped by posting "too much" on the 
forum, I've described the strategy for reading only certain portions of 
huge .csv files below.

I think that this very well could be of interest to others... I'm sure 
that I'm not alone in the need to read only certain variables (ie, 
columns) from VERY huge .csv files.

It has been suggested by Charles Berry, Ted Harding, and Brian Riply to 
use the unix "cut" command along with the R pipe() function. THeir 
advice has been invaluable.

As I've written the code so farm I'm finding that the "cut" command is 
not reading the file properly... or at least in the manner that I'm 
expecting.

Here was my strategy:
*STEP 1. read the whole huge file --- (almost impossible! even with a 
very good computer!)
STEP 2. use the pipe and cut commands to read only the desired columns 
of the file
STEP 3. compare results by tabulating a variable from the whole file 
with the file obtained in (2)*

I found that the comparision gave different tabulations!  :-(

I've provided my code below. I'd be quite grateful for suggestions on 
how to fix this.

My sincere thanks to all who have or will provide guidance on this problem.

Phil Smith
Duluth, GA
 
*## STEP 1: read the whole huge file*
##
## read the whole file
##
    your.file    <-    c("//home//philipsmith//mydata.csv")
    dat        <-    read.csv( file = your.file )

##
## read the names from the 1st line of the whole file
## that line contains all of the variable names
##
    col.namz    <-    c( scan( your.file , what=character(0), nlines=1 , 
sep=",") )

##
## check to see whether  all of the column names from the whole file
## are the same as in col.namz
##
     all( col.namz == names(dat))

##
## they are!! :-)
##

*## STEP 2: use the pipe and cut commands to read only the desired 
columns of the file*
##
## designate which variable names are to be read
## using the unix command "cut" and the function pipe()
##
    colz    <-    c("ESTIAP07" )

##
## find the column numbers in the whole file that correspond to
## the variables designated to be read by the unix command
## and specified in the colz vector
##

    col.pos     <-     match( colz , col.namz , nomatch=0 )
    ##
    ## the following line is commented out,
    ## since for this example the number of designated variables
    ## by colz is only 1 variable
    ##
    ## col.pos        <-    paste( col.pos , collapse=',' )

##
## character string of file name for unix read with cut function
##
    fn        <-    c("/home/philipsmith/mydata.csv")

##
## create a character vector of the unix command
##
    unix.cmd    <-    paste( "cut -d, -f" , col.pos  , " " , fn  , sep = 
'' )

##
## read the designated columns, only, from the whole file
## using pipe() and the unix command cut
##
    gnu.dat        <-    read.csv( pipe ( description=unix.cmd ) )



*## STEP 3. compare results by tabulating a variable from the whole file 
with the file obtained in (2)*
##
## tabulate the designated variable from the whole file
##
    table( dat$ESTIAP07 )

##
## tabulate the designated variable from the file
## that has the designated columns, only
##
    table( gnu.dat$ESTIAP07 )

 > table( dat$ESTIAP07 )

  1   2   4   5   6   7   8  10  11  12  13  14  16  17  18  19  20  22  
24  25
340 278 304 319 334 295 405 342 519 474 413 476 511 322 517 393 364 377 
447 425
 27  28  29  30  31  34  35  36  37  38  40  41  44  46  47  49  50  51  
52  53
462 382 368 502 385 494 454 497 484 385 360 419 355 466 461 369 372 431 
384 331
 54  55  56  57  58  59  60  61  62  63  64  65  66  68  69  72  73  74  
75  76
478 468 348 323 363 287 322 364 317 363 423 337 409 312 370 360 348 309 
244 300
 77  79  80 773
307 454 445 340
 >
 > ##
 > ## tabulate the designated variable from the file
 > ## that has the designated columns, only
 > ##
 > table( gnu.dat$ESTIAP07 )

  1   2   3   4   5   6   7   8  10  11  12  13  14  16  17  18  19  20  
22  24
342 291   1 308 319 334 295 405 341 518 471 413 476 511 322 517 393 363 
377 446
 25  27  28  29  30  31  34  35  36  37  38  40  41  44  46  47  49  50  
51  52
425 461 382 368 502 385 494 454 496 483 385 360 419 354 466 461 369 371 
431 384
 53  54  55  56  57  58  59  60  61  62  63  64  65  66  68  69  72  73  
74  75
331 478 467 348 322 363 287 320 364 317 363 423 337 408 312 368 360 347 
309 243
 76  77  79  80 157 773
300 307 454 445   1 340
 > ?pipe
 >



More information about the R-help mailing list