[R] Help with read.csv.sql()

H @gent@ @end|ng |rom medd@t@|nc@com
Mon Jul 20 23:54:56 CEST 2020


On 07/18/2020 11:42 PM, Rasmus Liland wrote:
> On 2020-07-18 18:09 +0100, Rui Barradas wrote:
> | �s 17:59 de 18/07/2020, H escreveu:
> | | On Fri, Jul 17, 2020 at 6:28 PM H <agents using meddatainc.com> wrote:
> | | | 
> | | | The problem I am having is that 
> | | | the csv files have header rows 
> | | | with column names that are 
> | | | slightly different from the column 
> | | | names I have assigned in the 
> | | | dataframe and it seems that when I 
> | | | read the csv data into the 
> | | | dataframe, the column names from 
> | | | the csv file replace the column 
> | | | names I chose when creating the 
> | | | dataframe.
> | | | 
> | | | A secondary issue is that the csv 
> | | | files have a column with a date in 
> | | | mm/dd/yyyy format that I would 
> | | | like to make into a Date type 
> | | | column in my dataframe. Again, I 
> | | | have been unable to find a way - 
> | | | if at all possible - to force a 
> | | | conversion into a Date format when 
> | | | importing into the dataframe. The 
> | | | best I have so far is to import is 
> | | | a character column and then use 
> | | | as.Date() to later force the 
> | | | conversion of the dataframe 
> | | | column.
> | | 
> | | The documentation for read.csv.sql() 
> | | suggests that colClasses() and/or 
> | | field.types() should work but I may 
> | | well have misunderstood the 
> | | documentation, hence my question in 
> | | this group.
> | 
> | As for colClasses, those are R class 
> | names.
>
> Ok Mister H, I might have hit the nail 
> on the head this time with this badass 
> example for your usecase:
>
> 	# Make a csv with %d/%m/%Y dates in it ...
> 	Lines <- "STM05-1 2005/02/28 17:35 Good -35.562 177.158
> 	STM05-1 2005/02/28 19:44 Good -35.487 177.129
> 	STM05-1 2005/02/28 23:01 Unknown -35.399 177.064
> 	STM05-1 2005/03/01 07:28 Unknown -34.978 177.268
> 	STM05-1 2005/03/01 18:06 Poor -34.799 177.027
> 	STM05-1 2005/03/01 18:47 Poor -34.85 177.059
> 	STM05-2 2005/02/28 12:49 Good -35.928 177.328
> 	STM05-2 2005/02/28 21:23 Poor -35.926 177.314
> 	"
> 	DF <- read.table(textConnection(Lines), as.is = TRUE,
> 	  col.names = c("Id", "Date", "Time", "Quality", "Lat", "Long"))
> 	DF$Date <- format(as.Date(DF$Date, "%Y/%m/%d"), "%d/%m/%Y")
> 	write.csv(DF, file="df.csv", row.names=FALSE)
> 	
> 	colClasses <-
> 	  c("character",
> 	    "Date",
> 	    "character",
> 	    "character",
> 	    "numeric",
> 	    "numeric")
> 	sql <- paste0(
> 	  "select ",
> 	    "date(",  # [2]
> 	      "substr(Date, 8, 4) || '-' || ",  # [1]
> 	      "substr(Date, 5, 2) || '-' || ",
> 	      "substr(Date, 2, 2)), Long, Lat, Quality ",
> 	  "from ff where Quality like '%oo%' and Long>177.129")
> 	ff <- file(description="df.csv", open="r")
> 	dat <- sqldf::read.csv.sql(
> 	  sql=sql, colClasses=colClasses)
> 	close(ff)
> 	
> 	str(dat)
> 	
> 	as.Date(dat[,1])
> 	dat[,3]
>
> Both sqlite and Postgres has a function 
> substr you can call on strings like 
> this.[5]  I have a hunch this has always 
> been possible in sql from way back ... 
>
> The warning from sqldf about unused 
> connections, might suggest file 
> descriptor handling to be a bit crusty 
> ... [3] 
>
> The thing is, defining the second column 
> as of type Date in colClasses happens to 
> work, but it's still character when you 
> check with str(dat) ... perhaps it has 
> something to do with this info from [4]: 
>
> 	as_tibble_row() converts a vector to 
> 	a tibble with one row. The input 
> 	must be a bare vector, e.g. vectors 
> 	of dates are not supported yet. If 
> 	the input is a list, all elements 
> 	must have length one.
>
> [1] https://stackoverflow.com/questions/15563656/convert-string-to-date-in-sqlite
> [2] https://www.sqlite.org/lang_datefunc.html
> [3] https://groups.google.com/forum/#!topic/sqldf/mcQ_K_E--q8
> [4] https://tibble.tidyverse.org/reference/as_tibble.html
> [5] https://www.sqlite.org/lang_corefunc.html#substr, 
> https://www.postgresql.org/docs/9.1/functions-string.html,
> http://www.h2database.com/html/functions.html#substring 
>
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Thank you for your extensive example. However, I have decided to simply convert column types as necessary and rename columns as desired after importing the data since that seems the simplest solution.


	[[alternative HTML version deleted]]



More information about the R-help mailing list