[R] using sqldf's read.csv.sql to read a file with "NA" for missing

David L Carlson dcarlson at tamu.edu
Thu Sep 6 00:22:21 CEST 2012


You will probably have to contact the maintainer of package sqldf (G.
Grothendieck <ggrothendieck at gmail.com>) as it appears that the package does
not support missing values. I tried stripping out the NAs so that the fields
were empty, but the results are the same. Fields specified as numeric are
converted to character. Inserting a complete record at the top of the data
prevents the conversion to text, but NA's are still converted to zeros.

As a workaround, you could replace NA's with 9999 before reading the file
and then convert this value to NA within R.

----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of David Reiner
> Sent: Wednesday, September 05, 2012 4:40 PM
> To: r-help at r-project.org
> Subject: Re: [R] using sqldf's read.csv.sql to read a file with "NA"
> for missing
> 
> I see the attachment didn't go through.
> here is the text:
> 
> 2012-07-01,15:50:00,NA,0,NA,0,NA
> 2012-07-01,15:51:00,NA,0,NA,0,NA
> 2012-07-01,15:52:00,NA,0,NA,0,NA
> 2012-07-01,15:53:00,NA,0,NA,0,NA
> 2012-07-01,15:54:00,NA,0,NA,0,NA
> 2012-07-01,15:55:00,NA,0,NA,0,NA
> 2012-07-01,15:56:00,NA,0,NA,0,NA
> 2012-07-01,15:57:00,NA,0,NA,0,NA
> 2012-07-01,15:58:00,NA,0,NA,0,NA
> 2012-07-01,15:59:00,NA,0,NA,0,NA
> 2012-07-01,16:00:00,1358.5,1,1358.5,4,NA
> 2012-07-01,16:01:00,NA,0,1358.25,2,NA
> 2012-07-01,16:02:00,NA,0,1358.25,2,NA
> 2012-07-01,16:03:00,NA,0,1358.25,2,NA
> 2012-07-01,16:04:00,NA,0,1358.25,2,NA
> 2012-07-01,16:05:00,NA,0,1358.25,2,NA
> 2012-07-01,16:06:00,1358.25,2,1358.25,2,NA
> 2012-07-01,16:07:00,1358.25,2,1358.25,2,NA
> 2012-07-01,16:08:00,1358,3,1358,2,NA
> 2012-07-01,16:09:00,1358.25,2,1358.25,2,NA
> 2012-07-01,16:10:00,1358.25,2,1358.25,2,NA
> 2012-07-01,16:11:00,1358.25,2,1358.25,2,NA
> 2012-07-01,16:12:00,1357.5,3,1357.5,3,NA
> 2012-07-01,16:13:00,1357.25,10,1357.25,3,1357.25
> 2012-07-01,16:14:00,NA,0,1357.25,10,1357.125
> 2012-07-01,16:15:00,1357,20,1357,18,1357
> 2012-07-01,16:16:00,1357,20,1357,18,1357
> 2012-07-01,16:17:00,1357,20,1357,18,1357
> 2012-07-01,16:18:00,1357,20,1357,18,1357
> 2012-07-01,16:19:00,1357,20,1357,18,1357
> 2012-07-01,16:20:00,1357,20,1357,18,1357
> 2012-07-01,16:21:00,1357,20,1357,18,1357
> 
> -- David
> 
> 
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of David Reiner
> Sent: Wednesday, September 05, 2012 4:32 PM
> To: r-help at r-project.org
> Subject: [R] using sqldf's read.csv.sql to read a file with "NA" for
> missing
> 
> I'm trying to use sqdf's function read.csv.sql to read CSV files in
> which the missing values are represented by NA's.
> Plain old read.csv works fine on these files, but they are rather large
> and I'd like to filter using sql-like statements.
> However, even if I specify field.types correctly and nrows=-1, it still
> turns the columns with NA's into chars or 0.
> I'm trying to make this OS independent, so I don't think I can use a
> filter to convert the NA's to NULL's or whatever SQLite would
> understand.
> I can accept it everything has to be read in as char and then convert
> to doubles with as.numeric, but I'm looking for speed.
> 
> Here is code I thought would read the file (I've attached a small
> sample.)
> It almost works if there are no NA's in the initial rows, but it still
> turns NA's into 0's instead of NA or something I can change into NA;
> and it returns characters if there are NA's in the initial rows.
> (0 is a possible value so I can't filter out the 0's.)
> 
> field.types <- list(V1='char', V2='char', V3='real', V4='int',
> V5='real', V6='int', V7='real')
> dtst <- read.csv.sql("./tmp.csv", header=FALSE,
> field.types=field.types, nrows=-1)
> str(dtst)
> 
> 'data.frame':   32 obs. of  7 variables:
>  $ V1: chr  "2012-07-01" "2012-07-01" "2012-07-01" "2012-07-01" ...
>  $ V2: chr  "15:50:00" "15:51:00" "15:52:00" "15:53:00" ...
>  $ V3: chr  "NA" "NA" "NA" "NA" ...
>  $ V4: int  0 0 0 0 0 0 0 0 0 0 ...
>  $ V5: chr  "NA" "NA" "NA" "NA" ...
>  $ V6: int  0 0 0 0 0 0 0 0 0 0 ...
>  $ V7: chr  "NA" "NA" "NA" "NA" ...
> 
> Any suggestions?
> Thanks,
> -- David L. Reiner
> 
> 
> 
> 
> This e-mail and any materials attached hereto, including, without
> limitation, all content hereof and thereof (collectively, "XR Content")
> are confidential and proprietary to XR Trading, LLC ("XR") and/or its
> affiliates, and are protected by intellectual property laws.  Without
> the prior written consent of XR, the XR Content may not (i) be
> disclosed to any third party or (ii) be reproduced or otherwise used by
> anyone other than current employees of XR or its affiliates, on behalf
> of XR or its affiliates.
> 
> THE XR CONTENT IS PROVIDED AS IS, WITHOUT REPRESENTATIONS OR WARRANTIES
> OF ANY KIND.  TO THE MAXIMUM EXTENT PERMISSIBLE UNDER APPLICABLE LAW,
> XR HEREBY DISCLAIMS ANY AND ALL WARRANTIES, EXPRESS AND IMPLIED,
> RELATING TO THE XR CONTENT, AND NEITHER XR NOR ANY OF ITS AFFILIATES
> SHALL IN ANY EVENT BE LIABLE FOR ANY DAMAGES OF ANY NATURE WHATSOEVER,
> INCLUDING, BUT NOT LIMITED TO, DIRECT, INDIRECT, CONSEQUENTIAL, SPECIAL
> AND PUNITIVE DAMAGES, LOSS OF PROFITS AND TRADING LOSSES, RESULTING
> FROM ANY PERSON'S USE OR RELIANCE UPON, OR INABILITY TO USE, ANY XR
> CONTENT, EVEN IF XR IS ADVISED OF THE POSSIBILITY OF SUCH DAMAGES OR IF
> SUCH DAMAGES WERE FORESEEABLE.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list