[R] splitting strings effriciently

MacQueen, Don macqueen1 at llnl.gov
Tue Jan 10 01:19:28 CET 2012


See suggestion inserted below.
It assumes and requires that every input IP address has the required four
elements.

-Don

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 1/8/12 5:11 AM, "Enrico Schumann" <enricoschumann at yahoo.de> wrote:

>
>Hi Andrew,
>
>you can use strsplit for a character vector; you do not have to call it
>for every element data$ComputerName[i].
>
>If I understand correctly, maybe something like this helps
>
> > ip <- "123.456.789.321"  ## example data
> > df <- data.frame(ip = rep(ip, 9), stringsAsFactors=FALSE)
> > df
>                ip
>1 123.456.789.321
>2 123.456.789.321
>3 123.456.789.321
>4 123.456.789.321
>5 123.456.789.321
>6 123.456.789.321
>7 123.456.789.321
>8 123.456.789.321
>9 123.456.789.321
>
> >
> > res <- unlist(strsplit(df[["ip"]], "\\."))


At this point, I would do

> res <- matrix(res,ncol=4, byrow=TRUE)
> res
      [,1]  [,2]  [,3]  [,4]
 [1,] "123" "456" "789" "321"
 [2,] "123" "456" "789" "321"
 [3,] "123" "456" "789" "321"
 [4,] "123" "456" "789" "321"
 [5,] "123" "456" "789" "321"
 [6,] "123" "456" "789" "321"
 [7,] "123" "456" "789" "321"
 [8,] "123" "456" "789" "321"
 [9,] "123" "456" "789" "321"

Then each column of the matrix is one element of the IP address.



> > ii <- seq(1, nrow(df)*4, by = 4)
> > res[ii]   ## A
>[1] "123" "123" "123" "123" "123" "123" "123"
>[8] "123" "123"
> > res[ii+1] ## B
>[1] "456" "456" "456" "456" "456" "456" "456"
>[8] "456" "456"
> > res[ii+2] ## C
>[1] "789" "789" "789" "789" "789" "789" "789"
>[8] "789" "789"
> > res[ii+3] ## D
>[1] "321" "321" "321" "321" "321" "321" "321"
>[8] "321" "321"
>
>
>Regards,
>Enrico
>
>
>Am 08.01.2012 11:06, schrieb Andrew Roberts:
>> Folks,
>>
>> I have a data frame with 4861469 rows that contains an ip address
>> xxx.xxx.xxx.xxx as one of the columns. I want to assign a site to each
>> row based on IP ranges. To do this I have a function to split the ip
>> address as character into class A,B,C and D components. It works but is
>> horribly inefficient in terms of speed. I can't quite see how one of the
>> l/s/m/t/apply functions could be brought to bear on the problem. Does
>> anyone have any thoughts?
>>
>> for(i in 1:4861469)
>>     {
>>     lst<-unlist(strsplit(data$ComputerName[i], "\\."))
>>     data$IPA[i]<-lst[[1]]
>>     data$IPB[i]<-lst[[2]]
>>     data$IPC[i]<-lst[[3]]
>>     data$IPD[i]<-lst[[4]]
>>     rm(lst)
>>     }
>>
>> Andrew
>>
>> Andrew Roberts
>> Children's Orthopaedic Surgeon
>> RJAH, Oswestry, UK
>>
>> 	[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>-- 
>Enrico Schumann
>Lucerne, Switzerland
>http://nmof.net/
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list