[R] Removing rows if certain elements are found in character string

Tue Jul 3 02:15:15 CEST 2012

You will have to change the 'i1' expression as follows:

> i1 <- grepl("^([0D]|[0d])*$", dd$ch)
> i1  # matches strings with d & D in them
 [1]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> # second string had 'd' & 'D' in it so it was TRUE above and FALSE below
> i1new <- grepl("^([0D]*$|[0d]*$)", dd$ch)
> i1new
 [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>
>

I put a 'd' and 'D' in the second string and the original regular
expression is equivalent to

grepl("^[0dD]*$", dd$ch)

which will match strings containing d, D and 0.  If you only want 'd'
or 'D' (and not both), then you will have to use the one in 'i1new'.

On Mon, Jul 2, 2012 at 7:24 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
> Hello,
>
> Try regular expressions instead.
> In this data.frame, I've changed row nr.4 to have a row with 'D' as first
> non-zero character.
>
> dd <- read.table(text="
>
> ch     count
> 1  0000000000D0000000000000000000000000000000000000 0.007368
> 2  0000000000d0000000000000000000000000000000000000 0.002456
> 3  000000000T00000000000000000000000000000000000000 0.007368
> 4  000000000DT0000000000000000000000000000000000000 0.007368
>
> 5  000000000T00000000000000000000000000000000000000 0.002456
> 6  000000000Td0000000000000000000000000000000000000 0.002456
> 7  00000000T000000000000000000000000000000000000000 0.007368
> 8  00000000T0D0000000000000000000000000000000000000 0.007368
> 9  00000000T000000000000000000000000000000000000000 0.002456
> 10 00000000T0d0000000000000000000000000000000000000 0.002456
> ", header=TRUE)
> dd
>
> i1 <- grepl("^([0D]|[0d])*$", dd$ch)
> i2 <- grepl("^0*[Dd]", dd$ch)
>
> dd[!i1, ]
> dd[!i2, ]
> dd[!(i1 | i2), ]
>
>
> Hope this helps,
>
> Rui Barradas
>
> Em 02-07-2012 23:48, Claudia Penaloza escreveu:
>
>> I would like to remove rows from the following data frame (df) if there
>> are
>> only two specific elements found in the df$ch character string (I want to
>> remove rows with only "0" & "D" or "0" & "d"). Alternatively, I would like
>> to remove rows if the first non-zero element is "D" or "d".
>>
>>
>>                                                   ch     count
>> 1  0000000000D0000000000000000000000000000000000000 0.007368;
>> 2  0000000000d0000000000000000000000000000000000000 0.002456;
>> 3  000000000T00000000000000000000000000000000000000 0.007368;
>> 4  000000000TD0000000000000000000000000000000000000 0.007368;
>> 5  000000000T00000000000000000000000000000000000000 0.002456;
>> 6  000000000Td0000000000000000000000000000000000000 0.002456;
>> 7  00000000T000000000000000000000000000000000000000 0.007368;
>> 8  00000000T0D0000000000000000000000000000000000000 0.007368;
>> 9  00000000T000000000000000000000000000000000000000 0.002456;
>> 10 00000000T0d0000000000000000000000000000000000000 0.002456;
>>
>>
>> I tried the following but it doesn't work if there is more than one
>> character per string:
>>
>>> df <- df[!df$ch %in% c("0","D"),]
>>> df <- df[!df$ch %in% c("0","d"),]
>>
>>
>> Any help greatly appreciated,
>> Claudia
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.