[R] Help with read.csv.sql()

Wed Jul 29 17:11:44 CEST 2020

You might achieve this using readr:

```
library(readr)

lines <- "Id, Date, Time, Quality, Lat, Long
     STM05-1, 2005/02/28, 17:35, Good, -35.562, 177.158
     STM05-1, 2005/02/28, 19:44, Good, -35.487, 177.129
     STM05-1, 2005/02/28, 23:01, Unknown, -35.399, 177.064
     STM05-1, 2005/03/01, 07:28, Unknown, -34.978, 177.268
     STM05-1, 2005/03/01, 18:06, Poor, -34.799, 177.027
     STM05-1, 2005/03/01, 18:47, Poor, -34.85, 177.059
     STM05-2, 2005/02/28, 12:49, Good, -35.928, 177.328
     STM05-2, 2005/02/28, 21:23, Poor, -35.926, 177.314"

read_csv(lines)

read_csv(
   lines,
   skip = 1, # Ignore the header row
   col_names = c("myId", "myDate", "myTime", "myQuality", "myLat", 
"myLong"),
   col_types = cols(
     myDate = col_date(format = ""),
     myTime = col_time(format = ""),
     myLat = col_number(),
     myLong = col_number(),
     .default = col_character()
   )
   )

read_csv(
   lines,
   col_types = cols_only(
     Id = col_character(),
     Date = col_date(format = ""),
     Time = col_time(format = "")
   )
)

read_csv(
   lines,
   skip = 1, # Ignore the header row
   col_names = c("myId", "myDate", "myTime", "myQuality", "myLat", 
"myLong"),
   col_types = cols_only(
     myId = col_character(),
     myDate = col_date(format = ""),
     myTime = col_time(format = "")
   )
)
```

HTH
Ulrik

On 2020-07-20 02:07, H wrote:
> On 07/18/2020 01:38 PM, William Michels wrote:
>> Do either of the postings/threads below help?
>> 
>> https://r.789695.n4.nabble.com/read-csv-sql-to-select-from-a-large-csv-file-td4650565.html#a4651534
>> https://r.789695.n4.nabble.com/using-sqldf-s-read-csv-sql-to-read-a-file-with-quot-NA-quot-for-missing-td4642327.html
>> 
>> Otherwise you can try reading through the FAQ on Github:
>> 
>> https://github.com/ggrothendieck/sqldf
>> 
>> HTH, Bill.
>> 
>> W. Michels, Ph.D.
>> 
>> 
>> 
>> On Sat, Jul 18, 2020 at 9:59 AM H <agents using meddatainc.com> wrote:
>>> On 07/18/2020 11:54 AM, Rui Barradas wrote:
>>>> Hello,
>>>> 
>>>> I don't believe that what you are asking for is possible but like 
>>>> Bert suggested, you can do it after reading in the data.
>>>> You could write a convenience function to read the data, then change 
>>>> what you need to change.
>>>> Then the function would return this final object.
>>>> 
>>>> Rui Barradas
>>>> 
>>>> Às 16:43 de 18/07/2020, H escreveu:
>>>> 
>>>>> On 07/17/2020 09:49 PM, Bert Gunter wrote:
>>>>>> Is there some reason that you can't make the changes to the data 
>>>>>> frame (column names, as.date(), ...) *after* you have read all 
>>>>>> your data in?
>>>>>> 
>>>>>> Do all your csv files use the same names and date formats?
>>>>>> 
>>>>>> 
>>>>>> Bert Gunter
>>>>>> 
>>>>>> "The trouble with having an open mind is that people keep coming 
>>>>>> along and sticking things into it."
>>>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>>>> 
>>>>>> 
>>>>>> On Fri, Jul 17, 2020 at 6:28 PM H <agents using meddatainc.com 
>>>>>> <mailto:agents using meddatainc.com>> wrote:
>>>>>> 
>>>>>>      I have created a dataframe with columns that are characters, 
>>>>>> integers and numeric and with column names assigned by me. I am 
>>>>>> using read.csv.sql() to read portions of a number of large csv 
>>>>>> files into this dataframe, each csv file having a header row with 
>>>>>> columb names.
>>>>>> 
>>>>>>      The problem I am having is that the csv files have header 
>>>>>> rows with column names that are slightly different from the column 
>>>>>> names I have assigned in the dataframe and it seems that when I 
>>>>>> read the csv data into the dataframe, the column names from the 
>>>>>> csv file replace the column names I chose when creating the 
>>>>>> dataframe.
>>>>>> 
>>>>>>      I have been unable to figure out if it is possible to assign 
>>>>>> column names of my choosing in the read.csv.sql() function? I have 
>>>>>> tried various variations but none seem to work. I tried colClasses 
>>>>>> = c(....) but that did not work, I tried field.types = c(...) but 
>>>>>> could not get that to work either.
>>>>>> 
>>>>>>      It seems that the above should be feasible but I am missing 
>>>>>> something? Does anyone know?
>>>>>> 
>>>>>>      A secondary issue is that the csv files have a column with a 
>>>>>> date in mm/dd/yyyy format that I would like to make into a Date 
>>>>>> type column in my dataframe. Again, I have been unable to find a 
>>>>>> way - if at all possible - to force a conversion into a Date 
>>>>>> format when importing into the dataframe. The best I have so far 
>>>>>> is to import is a character column and then use as.Date() to later 
>>>>>> force the conversion of the dataframe column.
>>>>>> 
>>>>>>      Is it possible to do this when importing using 
>>>>>> read.csv.sql()?
>>>>>> 
>>>>>>      ______________________________________________
>>>>>>      R-help using r-project.org <mailto:R-help using r-project.org> mailing 
>>>>>> list -- To UNSUBSCRIBE and more, see
>>>>>>      https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>      PLEASE do read the posting guide 
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>      and provide commented, minimal, self-contained, reproducible 
>>>>>> code.
>>>>>> 
>>>>> Yes, the files use the same column names and date format (at least 
>>>>> as far as I know now.) I agree I could do it as you suggest above 
>>>>> but from a purist perspective I would rather do it when importing 
>>>>> the data using read.csv.sql(), particularly if column names and/or 
>>>>> date format might change, or be different between different files. 
>>>>> I am indeed selecting rows from a large number of csv files so this 
>>>>> is entirely plausible.
>>>>> 
>>>>> Has anyone been able to name columns in the read.csv.sql() call 
>>>>> and/or force date format conversion in the call itself? The first 
>>>>> refers to naming columns differently from what a header in the csv 
>>>>> file may have.
>>>>> 
>>>>> 
>>>>>     [[alternative HTML version deleted]]
>>>>> 
>>>>> ______________________________________________
>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide 
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>> The documentation for read.csv.sql() suggests that colClasses() 
>>> and/or field.types() should work but I may well have misunderstood 
>>> the documentation, hence my question in this group.
>>> 
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide 
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
> 
> I had read the sqldf() documentation but was left with the impression
> that what I want to do is not easily doable.
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.