[R] strsplit question

Joshua Wiley jwiley.psych at gmail.com
Wed Oct 12 08:07:34 CEST 2011


unlist(strsplit(Block[1:5], "-.+$"))

if you are going to want the other pieces later, the most efficient
way depends on the assumptions you can make about your data.  If there
are always two elements from the split:

matrix(unlist(strsplit(Block[1:5], "-")), ncol = 2, byrow = TRUE)
## or
do.call("rbind", strsplit(Block[1:5], "-"))

the first option dropping everything after - is marginally more
efficient, followed by the matrix technique.  A series of clunkier
options (in my view) would be:

unlist(strsplit(Block[1:5], "-"))[seq(from = 1, to = 2 *
length(Block[1:5]), by = 2)]

or very flexible in terms of extracting the first element (regardless
of how many there are), but computationally less efficient:

sapply(strsplit(Block[1:5], "-"), `[[`, 1)

but this is only slightly less so, and testing on a simple character
vector of length 10^8, was still complete in less than 1 second on a
1.66ghz dual core on R devel r57214 windows x64.

Cheers,

Josh




On Tue, Oct 11, 2011 at 10:20 PM, Erin Hodgess <erinm.hodgess at gmail.com> wrote:
> Dear R People:
>
> I have the following set of data
>> Block[1:5]
> [1] "5600-5699" "6100-6199" "9700-9799" "9400-9499" "8300-8399"
>
> and I want to split at the -
>
>> strsplit(Block[1:5],"-")
> [[1]]
> [1] "5600" "5699"
>
> [[2]]
> [1] "6100" "6199"
>
> [[3]]
> [1] "9700" "9799"
>
> [[4]]
> [1] "9400" "9499"
>
> [[5]]
> [1] "8300" "8399"
>
>>
>
> What is the best way to extract the pieces that are to the left of the
> dash, please?
>
> Thanks,
> Erin
>
>
> --
> Erin Hodgess
> Associate Professor
> Department of Computer and Mathematical Sciences
> University of Houston - Downtown
> mailto: erinm.hodgess at gmail.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, ATS Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/



More information about the R-help mailing list