[R] Read file

Gabor Grothendieck ggrothendieck at gmail.com
Tue Oct 5 07:11:41 CEST 2010


On Sat, Oct 2, 2010 at 11:31 PM, Nilza BARROS <nilzabarros at gmail.com> wrote:
> Dear R-users,
>
> I would like to know how could I read a file with different lines lengths.
> I need read this file and create an output to feed my database.
> So after reading I'll need create an output like this
>
> "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20100910,837460, 39,390)"
>

Read the data filling the short lines (i.e. the date and station
lines) with NAs.  Replace the *s with spaces and compute how many
non-NAs are in each row (cnt).  Append group which is 1 for lines
pertaining to the 1st station, 2 for the 2nd, etc.  Then merge it all
together in one big data frame, All, and generate a vector of SQL
strings:

DF <- read.table("d2010100100.txt", fill = TRUE)
DF[] <- lapply(DF, function(x) as.numeric(chartr("*", " ", x)))
cnt <- rowSums(!is.na(DF))
DF$group <- cumsum(cnt == 4)
Merge <- function(x, y) merge(x, y, by = "group")
All <- Reduce(Merge, split(DF, cnt))
with(All, sprintf("INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES
(%04d%02d%02d, %d, %d, %d)", V1.x, V2.x, V3.x, V1.y, V1, V2))

The result looks like this:

[1] "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20101001,
82599, 1008, -9999)"
[2] "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20101001,
83649, 1011, -9999)"
[3] "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20101001,
83649, 1000, 96)"
[4] "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20101001,
83649, 925, 782)"
[5] "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20101001,
83649, 850, 1520)"
[6] "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20101001,
83649, 700, 3171)"
[7] "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20101001,
83649, 500, 5890)"
[8] "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20101001,
83649, 400, 7600)"

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list