[R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

Rui Barradas ruipbarradas at sapo.pt
Tue Apr 24 19:15:10 CEST 2012


Hello,


Greg Snow wrote
> 
> Here is a method that uses negative look behind:
> 
>> tmp <- c('mutation','nonmutated','unmutated','verymutated','other')
>> grep("(?<!un)(?<!non)muta", tmp, perl=TRUE)
> [1] 1 4
> 
> it looks for muta that is not immediatly preceeded by un or non (but
> it would match "unusually mutated" since the un is not
> immediatly
> befor the muta).
> 
> Hope this helps,
> 
> On Mon, Apr 23, 2012 at 10:10 AM, Paul Miller <pjmiller_57@> wrote:
>> Hello All,
>>
>> Started out awhile ago trying to select columns in a dataframe whose
>> names contain some variation of the word "mutant" using code like:
>>
>> names(KRASyn)[grep("muta", names(KRASyn))]
>>
>> The idea then would be to add together the various columns using code
>> like:
>>
>> KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta", names(KRASyn))])
>>
>> What I discovered though, is that this selects columns like "nonmutated"
>> and "unmutated" as well as columns like "mutated", "mutation", and
>> "mutational".
>>
>> So I'd like to know how to select columns that have some variation of the
>> word "mutant" without the "non" or the "un". I've been looking around for
>> an example of how to do that but haven't found anything yet.
>>
>> Can anyone show me how to select the columns I need?
>>
>> Thanks,
>>
>> Paul
>>
>> ______________________________________________
>> R-help@ mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> -- 
> Gregory (Greg) L. Snow Ph.D.
> 538280@
> 
> ______________________________________________
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


Has anyone realized that both 'non' and 'un' end with the same letter? The
only one we really need to check?

(tmp <- c('mutation','nonmutated','unmutated','verymutated','other')) 

i1 <- grepl("muta", tmp)
i2 <- grepl("nmuta", tmp)

tmp[i1 & !i2]


Now, not an answer to Greg's post, just convoluted.


(tmp <- c(tmp, 'permutation', 'commutation'))

cols <- list()
cols[[1]] <- grep("muta", tmp)
cols[[2]] <- grep("nmuta", tmp)
cols[[3]] <- grep("(per)|(com)muta", tmp)

Reduce(setdiff, cols)

Rui Barradas


--
View this message in context: http://r.789695.n4.nabble.com/Selecting-columns-whose-names-contain-mutated-except-when-they-also-contain-non-or-un-tp4580914p4584219.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list