[R] how to split row elements [1] and [2] of a string variable A via srtsplit and sapply

Bert Gunter bgunter.4567 at gmail.com
Thu Sep 10 20:35:31 CEST 2015


...
Alternatively, you can avoid the looping (i.e. sapply) altogether by:

do.call(rbind,strsplit(x[[1]],":"))[,-3]


     [,1] [,2]
[1,] "1"  "29439275"
[2,] "5"  "85928892"
[3,] "10" "128341232"
[4,] "1"  "106024283"
[5,] "3"  "62707519"
[6,] "2"  "80464120"

These can then be added to the existing frame, converted to numeric, etc.

Cheers,
Bert
Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Thu, Sep 10, 2015 at 11:05 AM, jim holtman <jholtman at gmail.com> wrote:
> try this:
>
>
>> x <- read.table(text = "A          B
> +  1:29439275 0.46773514
> +  5:85928892 0.81283052
> +  10:128341232 0.09332543
> +  1:106024283:ID 0.36307805
> +  3:62707519 0.42657952
> +  2:80464120 0.89125094", header = TRUE, as.is = TRUE)
>>
>> temp <- strsplit(x$A, ":")
>> x$C <- sapply(temp, '[[', 1)
>> x$D <- sapply(temp, '[[', 2)
>>
>> x
>                A          B  C         D
> 1     1:29439275 0.46773514  1  29439275
> 2     5:85928892 0.81283052  5  85928892
> 3   10:128341232 0.09332543 10 128341232
> 4 1:106024283:ID 0.36307805  1 106024283
> 5     3:62707519 0.42657952  3  62707519
> 6     2:80464120 0.89125094  2  80464120
>
>
>
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
> On Thu, Sep 10, 2015 at 1:46 PM, aldi <aldi at wustl.edu> wrote:
>
>> Hi,
>> I have a data.frame x1, of which a variable A needs to be split by
>> element 1 and element 2 where separator is ":". Sometimes could be three
>> elements in A, but I do not need the third element.
>>
>> Since R does not have a SCAN function as in SAS, C=scan(A,1,":");
>> D=scan(A,2,":");
>> I am using a combination of strsplit and sapply. If I do not use the
>> index [i] then R captures the full vector . Instead I need row by row
>> capturing the first and the second element and from them create two new
>> variables C and D.
>> Right now as is somehow in the loop i C is captured correctly, but D is
>> missing because the variables AA does not have it. Any suggestions?
>> Thank you in advance, Aldi
>>
>> A          B
>> 1:29439275 0.46773514
>> 5:85928892 0.81283052
>> 10:128341232 0.09332543
>> 1:106024283:ID 0.36307805
>> 3:62707519 0.42657952
>> 2:80464120 0.89125094
>>
>> x1<-read.table(file='./test.txt',head=T,sep='\t')
>> x1$A <- as.character(x1$A)
>>
>> for(i in 1:length(x1$A)){
>>
>> x1$AA[i] <- as.numeric(unlist(strsplit(x1$A[i],':')))
>>
>> x1$C[i] <- sapply(x1$AA[i],function(x)x[1])
>> x1$D[i] <- sapply(x1$AA[i],function(x)x[2])
>> }
>>
>> x1
>>
>>
>>
>>  > x1
>>                 A          B AA  C  D
>> 1     1:29439275 0.46773514  1  1 NA
>> 2     5:85928892 0.81283052  5  5 NA
>> 3   10:128341232 0.09332543 10 10 NA
>> 4 1:106024283:ID 0.36307805  1  1 NA
>> 5     3:62707519 0.42657952  3  3 NA
>> 6     2:80464120 0.89125094  2  2 NA
>>
>>
>> --
>>
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list