[R] Better use of regex

Thu Sep 15 18:47:56 CEST 2016

Thanks for the reproducible example.

Using regular expressions:

sub(".*HS_(.).*", "\\1", dimInfo[grep("HS_",dimInfo)])

The grep() gets just the indices that contain "HS_" and the sub()
picks up the character you want from the subvector indexed by them and
replaces everything with it.

Cheers,
Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Thu, Sep 15, 2016 at 9:17 AM, Doran, Harold <HDoran at air.org> wrote:
> I have produced a terribly inefficient piece of codes. In the end, it gives exactly what I need, but clumsily steps through multiple steps which I'm sure could be more efficiently reduced.
>
> Below is a reproducible example. What I have to begin with is character vector, dimInfo. What I want to do is parse this vector 1) find the elements containing 'HS' and 2) grab *only* the first character after the "HS_". The final line of code in the example gives what I need.
>
> Any suggestions on a better approach?
>
> Harold
>
>
> dimInfo <- c("RecordID", "oppID", "position", "key", "operational", "IsSelected",
> "score", "item_1_HS_conv_ovrl_scr", "item_1_HS_elab_ovrl_scr",
> "item_1_HS_org_ovrl_scr")
>
> ff <- dimInfo[grep('HS', dimInfo)]
> gg <- strsplit(ff, 'HS_')
> hh <- sapply(1:3, function(i) gg[[i]][2])
> substr(hh, 1, 1)
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.