[R] Split

Wed Sep 23 17:37:24 CEST 2020

What is the delimiter is in the input data? Is it tab, space, etc?

Is this going to be the same for the output data that you will use for R input?

LMH

Val wrote:
> Thank you all for the help!
> 
> LMH, Yes I would like to see the alternative.  I am using this for a
> large data set and if the  alternative is more efficient than this
> then I would be happy.
> 
> On Tue, Sep 22, 2020 at 6:25 PM Bert Gunter <bgunter.4567 using gmail.com> wrote:
>>
>> To be clear, I think Rui's solution is perfectly fine and probably better than what I offer below. But just for fun, I wanted to do it without the lapply().  Here is one way. I think my comments suffice to explain.
>>
>>> ## which are the  non "_" indices?
>>> wh <- grep("_",F1$text, fixed = TRUE, invert = TRUE)
>>> ## paste "_." to these
>>> F1[wh,"text"] <- paste(F1[wh,"text"],".",sep = "_")
>>> ## Now strsplit() and unlist() them to get a vector
>>> z <- unlist(strsplit(F1$text, "_"))
>>> ## now cbind() to the data frame
>>> F1 <- cbind(F1, matrix(z, ncol = 2, byrow = TRUE))
>>> F1
>>   ID1 ID2   text    1  2
>> 1  A1  B1 NONE_. NONE  .
>> 2  A1  B1  cf_12   cf 12
>> 3  A1  B1 NONE_. NONE  .
>> 4  A2  B2  X2_25   X2 25
>> 5  A2  B3  fd_15   fd 15
>>> ## You can change the names of the 2 columns yourself
>>
>> Cheers,
>> Bert
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Tue, Sep 22, 2020 at 12:19 PM Rui Barradas <ruipbarradas using sapo.pt> wrote:
>>>
>>> Hello,
>>>
>>> A base R solution with strsplit, like in your code.
>>>
>>> F1$Y1 <- +grepl("_", F1$text)
>>>
>>> tmp <- strsplit(as.character(F1$text), "_")
>>> tmp <- lapply(tmp, function(x) if(length(x) == 1) c(x, ".") else x)
>>> tmp <- do.call(rbind, tmp)
>>> colnames(tmp) <- c("X1", "X2")
>>> F1 <- cbind(F1[-3], tmp)    # remove the original column
>>> rm(tmp)
>>>
>>> F1
>>> #  ID1 ID2 Y1   X1 X2
>>> #1  A1  B1  0 NONE  .
>>> #2  A1  B1  1   cf 12
>>> #3  A1  B1  0 NONE  .
>>> #4  A2  B2  1   X2 25
>>> #5  A2  B3  1   fd 15
>>>
>>>
>>> Note that cbind dispatches on F1, an object of class "data.frame".
>>> Therefore it's the method cbind.data.frame that is called and the result
>>> is also a df, though tmp is a "matrix".
>>>
>>>
>>> Hope this helps,
>>>
>>> Rui Barradas
>>>
>>>
>>> Às 20:07 de 22/09/20, Rui Barradas escreveu:
>>>> Hello,
>>>>
>>>> Something like this?
>>>>
>>>>
>>>> F1$Y1 <- +grepl("_", F1$text)
>>>> F1 <- F1[c(1, 2, 4, 3)]
>>>> F1 <- tidyr::separate(F1, text, into = c("X1", "X2"), sep = "_", fill =
>>>> "right")
>>>> F1
>>>>
>>>>
>>>> Hope this helps,
>>>>
>>>> Rui Barradas
>>>>
>>>> Às 19:55 de 22/09/20, Val escreveu:
>>>>> HI All,
>>>>>
>>>>> I am trying to create   new columns based on another column string
>>>>> content. First I want to identify rows that contain a particular
>>>>> string.  If it contains, I want to split the string and create two
>>>>> variables.
>>>>>
>>>>> Here is my sample of data.
>>>>> F1<-read.table(text="ID1  ID2  text
>>>>> A1 B1   NONE
>>>>> A1 B1   cf_12
>>>>> A1 B1   NONE
>>>>> A2 B2   X2_25
>>>>> A2 B3   fd_15  ",header=TRUE,stringsAsFactors=F)
>>>>> If the variable "text" contains this "_" I want to create an indicator
>>>>> variable as shown below
>>>>>
>>>>> F1$Y1 <- ifelse(grepl("_", F1$text),1,0)
>>>>>
>>>>>
>>>>> Then I want to split that string in to two, before "_" and after "_"
>>>>> and create two variables as shown below
>>>>> x1= strsplit(as.character(F1$text),'_',2)
>>>>>
>>>>> My problem is how to combine this with the original data frame. The
>>>>> desired  output is shown   below,
>>>>>
>>>>>
>>>>> ID1 ID2  Y1   X1    X2
>>>>> A1  B1    0   NONE   .
>>>>> A1  B1   1    cf        12
>>>>> A1  B1   0  NONE   .
>>>>> A2  B2   1    X2    25
>>>>> A2  B3   1    fd    15
>>>>>
>>>>> Any help?
>>>>> Thank you.
>>>>>
>>>>> ______________________________________________
>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>> ______________________________________________
>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>