[R] Data separated by spaces, getting data into R using field lengths

Lauri Nikkinen lauri.nikkinen at iki.fi
Tue Sep 8 14:52:05 CEST 2009


Thanks Petr, I tried something like this

> con <- file("C:temppi.txt", "r", blocking = FALSE)
> g <- readLines(con)
> close(con)
>
> sta <- c(1, 3, 5, 19)
> sto <- c(2, 4, 18, 100)
> do.call("rbind", lapply(g, function(x) substring(x, sta, sto)))
     [,1] [,2] [,3]             [,4]
[1,] "DF" "12" " This is an ex" "ample 1 This"
[2,] "DF" "12" " This is an 12" "32 This is"
[3,] "DF" "14" " This is 12334" " This is an "
[4,] "DF" "15" " This 23 This " "is an example"
>

But this is not the solution I was looking for. Thanks.

-L

2009/9/8 Petr PIKAL <petr.pikal at precheza.cz>:
> Hi
>
> what about reading each line by readLine and then split it to desired
> portions?
>
> x<-paste(letters, collapse="")
> substring(x, c(1,3,5),c(2,4,15))
>
> Regards
> Petr
>
>
> r-help-bounces at r-project.org napsal dne 08.09.2009 14:21:53:
>
>> This data is from database and the maximum length of a field is
>> defined. I mean that every column has a maximum length and I want to
>> use this maximum length as a separator. So if one "cell" in that
>> column is shorter than the maximum, "cell" should be padded with white
>> spaces or something like that. This seems to be hard to explain.
>>
>> Regards,
>> L
>>
>> 2009/9/8 Duncan Murdoch <murdoch at stats.uwo.ca>:
>> > On 9/8/2009 8:07 AM, Lauri Nikkinen wrote:
>> >>
>> >> Thanks, I tried it but I got
>> >>
>> >>> varlength <- c(2, 2, 18, 5, 18)
>> >>> read.fwf("c:temppi.txt", widths=varlength)
>> >>
>> >>  V1 V2                 V3    V4   V5
>> >> 1 DF 12  This is an exampl e 1 T  his
>> >> 2 DF 12  This is an 1232 T his i    s
>> >> 3 DF 14  This is 12334 Thi s is   an
>> >> 4 DF 15  This 23 This is a n exa mple
>> >>
>> >> Which is not the way I want it.
>> >
>> > It looks as though that's because you don't have fixed width data.  "
> This
>> > is an example" is 19 chars, including the leading space.  You told R
> it was
>> > 18.  " This is an " is only 12 characters.
>> >
>> > I would say you have two fixed width fields, and three varying fields,
> with
>> > no delimiters.  If the middle one of the three always contains digits
> and
>> > the others don't, you can probably extract them using sub(), but you
> can't
>> > use any of the read.* functions to do this:  your format is too
> strange.
>> >
>> > Duncan Murdoch
>> >
>> >>
>> >> structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = "DF", class
>> >> = "factor"),
>> >>    V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L,
>> >>    1L), .Label = c(" This 23 This is a", " This is 12334 Thi",
>> >>    " This is an 1232 T", " This is an exampl"), class = "factor"),
>> >>    V4 = structure(c(1L, 2L, 4L, 3L), .Label = c("e 1 T", "his i",
>> >>    "n exa", "s is "), class = "factor"), V5 = structure(c(2L,
>> >>    4L, 1L, 3L), .Label = c("an ", "his", "mple", "s"), class =
>> >> "factor")), .Names = c("V1",
>> >> "V2", "V3", "V4", "V5"), class = "data.frame", row.names = c(NA,
>> >> -4L))
>> >>
>> >> Any ideas?
>> >> -L
>> >>
>> >> 2009/9/8 Duncan Murdoch <murdoch at stats.uwo.ca>:
>> >>>
>> >>> On 9/8/2009 7:53 AM, Lauri Nikkinen wrote:
>> >>>>
>> >>>> I have a text file similar to this (separated by spaces):
>> >>>>
>> >>>> x <- "DF12 This is an example 1 This
>> >>>> DF12 This is an 1232 This is
>> >>>> DF14 This is 12334 This is an
>> >>>> DF15 This 23 This is an example
>> >>>> "
>> >>>>
>> >>>> and I know the field lengths of each variable (there is 5 variables
> in
>> >>>> this data set), which are:
>> >>>>
>> >>>> varlength <- c(2, 2, 18, 5, 18)
>> >>>>
>> >>>> How can I import this kind of data into R, using the varlength
>> >>>> variable as an field separator indicator?
>> >>>
>> >>> See ?read.fwf.
>> >>>
>> >>> Duncan Murdoch
>> >>>
>> >
>> >
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>




More information about the R-help mailing list