[R] Alternative to lops

Berry, Charles ccberry @end|ng |rom uc@d@edu
Thu Apr 4 20:01:16 CEST 2019


Comments inline, but first:

Please review the posting guide and follow the instructions there, especially:

1) "No HTML posting..."

2) "When providing examples, it is best to give an R command that constructs the data,..."

> On Apr 4, 2019, at 9:41 AM, Ek Esawi <esawiek using gmail.com> wrote:
> 
> Hi All--
> 
> Sorry i sent the one inadvertently
> 
> Her is a sample of my data. A data frame (MyDF) and a list (MyList).  My
> own data frame has over 10,000 rows. I want to find out which elements of
> MyDF$B contain any element(s) of MYList; then change MyDF$C to the name of
> the vector of the list that has match.
> 
> I solved this via loops and if statements, using &in&  but I am hoping for
> a better solution using the apply family functions. I tried something like
> this but did not work.
> 
> lapply(strsplit(MyDF$B," "),function(x) lapply(MyList,function(y)  if(sum(y
> %in% x)>0,x$Code==y[[1]]))
> 
> Thanks in advance--EK
> 
> My Sample data
> 
>> MyDF
> 
>    A     B        C
> 1 1 aa ab ac  0
> 2 2 bb bc bd  0
> 3 3    cc cf     0
> 4 4       dd     0
> 5 5       ee     0


Note: You did not tell us if myDF$B is a factor, in which case strsplit needs to accommodate multiple blanks:

 levels(MyDF$B)
[1] "      dd" "      ee" "   cc cf" "aa ab ac" "bb bc bd"
> 

> 
>> MyList
> 
> $X
> [1] "a"  "ba" "cc"
> 
> $Y
> [1] "abs" "aa"  "BA"  "BB"
> 
> $z
> [1] "ab" "bb" "xy" "zy" "gh"
> 
> 
> 
> Desired results.
> 
> 
> 
>> MyDF
> 
> A        B   C
> 1 1 aa ab ac Y

'aa' matches Y, 'ab' matches z, 'cc' does not match

> 2 2 bb bc bd Y

Huh? 'bb' matches z, 'bc' and 'bd' do not match, 

> 3 3    cc cf    X

'cc' matches X, 'cf' does not match

> 4 4       dd     0
> 5 5       ee     0
> 

Neither match.


You need to clarify what it is you seek. The example is hard to penetrate.

Maybe this helps you:

> queries <- strsplit(as.character(MyDF$B), "[ ]+")
> matches <- match( unlist(queries), unlist(MyList), 0)
> hits <- findInterval( matches, 1+cumsum(c(0,lengths(MyList))))
> hitList <- relist(hits, queries)
> hitList
[[1]]
[1] 2 3 0

[[2]]
[1] 3 0 0

[[3]]
[1] 0 1 0

[[4]]
[1] 0 0

[[5]]
[1] 0 0

You can now process hitList to get the desired vector.

HTH,

Chuck


More information about the R-help mailing list