[R] Removing rows if certain elements are found in character string

jim holtman jholtman at gmail.com
Tue Jul 3 02:15:15 CEST 2012


You will have to change the 'i1' expression as follows:

> i1 <- grepl("^([0D]|[0d])*$", dd$ch)
> i1  # matches strings with d & D in them
 [1]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> # second string had 'd' & 'D' in it so it was TRUE above and FALSE below
> i1new <- grepl("^([0D]*$|[0d]*$)", dd$ch)
> i1new
 [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>
>

I put a 'd' and 'D' in the second string and the original regular
expression is equivalent to

grepl("^[0dD]*$", dd$ch)

which will match strings containing d, D and 0.  If you only want 'd'
or 'D' (and not both), then you will have to use the one in 'i1new'.

On Mon, Jul 2, 2012 at 7:24 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
> Hello,
>
> Try regular expressions instead.
> In this data.frame, I've changed row nr.4 to have a row with 'D' as first
> non-zero character.
>
> dd <- read.table(text="
>
> ch     count
> 1  0000000000D0000000000000000000000000000000000000 0.007368
> 2  0000000000d0000000000000000000000000000000000000 0.002456
> 3  000000000T00000000000000000000000000000000000000 0.007368
> 4  000000000DT0000000000000000000000000000000000000 0.007368
>
> 5  000000000T00000000000000000000000000000000000000 0.002456
> 6  000000000Td0000000000000000000000000000000000000 0.002456
> 7  00000000T000000000000000000000000000000000000000 0.007368
> 8  00000000T0D0000000000000000000000000000000000000 0.007368
> 9  00000000T000000000000000000000000000000000000000 0.002456
> 10 00000000T0d0000000000000000000000000000000000000 0.002456
> ", header=TRUE)
> dd
>
> i1 <- grepl("^([0D]|[0d])*$", dd$ch)
> i2 <- grepl("^0*[Dd]", dd$ch)
>
> dd[!i1, ]
> dd[!i2, ]
> dd[!(i1 | i2), ]
>
>
> Hope this helps,
>
> Rui Barradas
>
> Em 02-07-2012 23:48, Claudia Penaloza escreveu:
>
>> I would like to remove rows from the following data frame (df) if there
>> are
>> only two specific elements found in the df$ch character string (I want to
>> remove rows with only "0" & "D" or "0" & "d"). Alternatively, I would like
>> to remove rows if the first non-zero element is "D" or "d".
>>
>>
>>                                                   ch     count
>> 1  0000000000D0000000000000000000000000000000000000 0.007368;
>> 2  0000000000d0000000000000000000000000000000000000 0.002456;
>> 3  000000000T00000000000000000000000000000000000000 0.007368;
>> 4  000000000TD0000000000000000000000000000000000000 0.007368;
>> 5  000000000T00000000000000000000000000000000000000 0.002456;
>> 6  000000000Td0000000000000000000000000000000000000 0.002456;
>> 7  00000000T000000000000000000000000000000000000000 0.007368;
>> 8  00000000T0D0000000000000000000000000000000000000 0.007368;
>> 9  00000000T000000000000000000000000000000000000000 0.002456;
>> 10 00000000T0d0000000000000000000000000000000000000 0.002456;
>>
>>
>> I tried the following but it doesn't work if there is more than one
>> character per string:
>>
>>> df <- df[!df$ch %in% c("0","D"),]
>>> df <- df[!df$ch %in% c("0","d"),]
>>
>>
>> Any help greatly appreciated,
>> Claudia
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.



More information about the R-help mailing list