[R] Convert COLON separated format

jim holtman jholtman at gmail.com
Tue Oct 9 14:10:11 CEST 2012


If you want something that is fast, read the file in, strip off the
colon/data, write it out to a temp and then read it back in.  Here is
a 355K line file:

> temp <- tempfile()
> input <- readLines('/temp/colon.txt')
> length(input)
[1] 355212
> system.time(input <- gsub("(:[0-9]+)", "", input))
   user  system elapsed
   0.72    0.00    0.74
> head(input)
[1] "1  5  27  345" "1  5  27  345" "1  5  27  345" "1  5  27  345" "1
 5  27  345" "1  5  27  345"
> writeLines(input, temp)
> system.time(newInput <- read.table(temp))
   user  system elapsed
   1.08    0.02    1.13
> dim(newInput)
[1] 355212      4
>
> head(newInput)
  V1 V2 V3  V4
1  1  5 27 345
2  1  5 27 345
3  1  5 27 345
4  1  5 27 345
5  1  5 27 345
6  1  5 27 345


On Tue, Oct 9, 2012 at 12:56 AM, Noah Silverman <noahsilverman at ucla.edu> wrote:
> I have a bunch of data sets that were created for the libsvm tool.  They are in "colon separated sparse format".
>
> i.e.
>
> 1  5:1  27:3  345:10
>
> Is a row with the label of "1" and only has values in columns 5, 27, and 345.
>
> I want to read these into a data.frame in R.
>
> Is there a simple way to do this?
>
> --
> Noah Silverman, M.S.
> UCLA Department of Statistics
> 8117 Math Sciences Building
> Los Angeles, CA 90095
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.




More information about the R-help mailing list