[R] Parsing variable-length delimited strings into a matrix

jim holtman jholtman at gmail.com
Tue Oct 4 14:43:34 CEST 2011


Will this do it for you:

> x <- readLines(textConnection("A,B,C
+ B,B
+ A,AA,C
+ A,B,BB,BBB,B,B"))
> closeAllConnections()
> x.s <- strsplit(x, ',')
> # determine max length
> x.max <- max(sapply(x.s, length))
> # create character matrix
> x.mat <- matrix(
+     sapply(x.s, function(a) c(a, rep(NA, x.max - length(a))))
+     , byrow = TRUE
+     , ncol = x.max
+     )
>
>
> x.mat
     [,1] [,2] [,3] [,4]  [,5] [,6]
[1,] "A"  "B"  "C"  NA    NA   NA
[2,] "B"  "B"  NA   NA    NA   NA
[3,] "A"  "AA" "C"  NA    NA   NA
[4,] "A"  "B"  "BB" "BBB" "B"  "B"
>


On Mon, Oct 3, 2011 at 11:40 AM, Benjamin Wright <bjw78 at well.ox.ac.uk> wrote:
>
> I'm struggling to find a way of parsing a vector of data in this sort of form:
>
> A,B,C
> B,B
> A,AA,C
> A,B,BB,BBB,B,B
>
> into a matrix (or data frame). The catch is that I don't know a priori how many entries there will be in each element, nor how many characters there will be. strsplit(vec,",") gets me a list, but I can't find a way of turning the list into a matrix. unlistlst) destroys the length data and do.call("rbind", lst) fails because of the uneven lengths. It is possible to go through the vector element by element, but that has proved too slow for my purposes.
>
> Is there a reasonably quick method of achieving this in a vector-oriented way?
>
> Cheers,
>
> Ben
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?



More information about the R-help mailing list