[R] lines those not started with "rs"

Bert Gunter bgunter.4567 at gmail.com
Mon Jan 30 20:22:33 CET 2017


... heh, heh and even simpler (but maybe not much faster)

 x[substring(x,1,2) != "rs"]


(DUHHH!)

-- Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Jan 30, 2017 at 11:18 AM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
> Rui, et. al.:
>
> **IF** the data set can be read into R (3e6 lines x ?bytes/line ??) ,
> then I think for a completely specified regular pattern such as that
> described by the OP, grep would be a bit inefficient. If x is a vector
> of strings, and you wish to remove all those that don't begin with
> "rs" then:
>
>  x[!substring(x,1,2) == "rs"]
>
> took about 1/2 the time on my computer as the grepl() version for a
> vector,x, of length 1e6.
>
> To be fair, I suspect this may be a negigible difference, as most of
> the time would probably be taken in extracting and replacing rows from
> the data frame. Nevertheless, it seems worthwhile to highlight the use
> of simple, efficient, albeit limited, tools when they *can* be used.
>
> All, of course, assuming I have understood the query correctly.
>
> Cheers,
> Bert
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Mon, Jan 30, 2017 at 8:59 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
>> Hello,
>>
>> Try to study the following example.
>>
>> A <- c("rs10000056", "rs10000076", "ab1234567")
>> x <- 1:3
>> dat <- data.frame(A, x)
>>
>> inx <- grepl("^rs", dat$A)
>> dat[!inx, ]
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> Em 30-01-2017 14:23, greg holly escreveu:
>>>
>>> Hi all;
>>>
>>> I have a file which has about 3.000.000 lines. Most of the lines at first
>>> column start with "rs", for example, rs10000056, rs10000076 and so on. I
>>> would like to get the lines which do not start with "rs" . Your helps
>>> highly appreciated.
>>>
>>> Regards,
>>>
>>> Greg
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list