[R] Convert COLON separated format

Rui Barradas ruipbarradas at sapo.pt
Tue Oct 9 07:28:07 CEST 2012


Hello,

Here's a function that doesn't do it all but might help.

fun <- function(x){
     x1 <- unlist(strsplit(x, " "))
     x2 <- x1[nchar(x1) > 0]
     i <- as.integer(x2[1])
     x3 <- unlist(strsplit(x2[-1], ":"))
     j <- as.integer(x3[rep(c(TRUE, FALSE), length(x3)/2)])
     y <- numeric(max(j))
     y[j] <- as.numeric(x3[rep(c(FALSE, TRUE), length(x3)/2)])
     list(row = i, line = y)
}

x <- "1  5:1  27:3  345:10"
fun(x)

If you know that your labels, i.e., row numbers are consecutive, have 
the function return just 'y', not a list.
Then use readLines to read the file in and lapply fun to it. Something like

ln <- readLines(filename)
lst <- lapply(ln, fun)

Then you'll have another problem. The lines' lengths. They shouldn't be 
all the same, so in order to make a data.frame or matrix you'll need 
extra work. Try the code above and say whether it's on the right track.

Also, take a look at package Matrix. It's a recommended package and it 
implements sparse matrices.

Hope this helps,

Rui Barradas

Em 09-10-2012 05:56, Noah Silverman escreveu:
> I have a bunch of data sets that were created for the libsvm tool.  They are in "colon separated sparse format".
>
> i.e.
>
> 1  5:1  27:3  345:10
>
> Is a row with the label of "1" and only has values in columns 5, 27, and 345.
>
> I want to read these into a data.frame in R.
>
> Is there a simple way to do this?
>
> --
> Noah Silverman, M.S.
> UCLA Department of Statistics
> 8117 Math Sciences Building
> Los Angeles, CA 90095
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list