[R] create list of names where two df contain == values

R. Michael Weylandt michael.weylandt at gmail.com
Thu Nov 17 07:21:48 CET 2011


Perhaps, if R FAQ 7.31 isn't a problem, this would work.

(df.1$AffyIds)[match(df.2$rMF, df.1$rMF)]

Michael

On Wed, Nov 16, 2011 at 1:11 PM, Rob Griffin <robgriffin247 at hotmail.com> wrote:
> As another potential route could I put something in to the original code
> that makes df.2 (maindata2) which picks one of the AffyIds at random for the
> duplicated FlybaseCG values (shown below)
>
> maindata2<-aggregate(maindata[,c(161,172,168,255,254,258,264,265,263,271)],
> by = maindata[,167, drop = F], mean)
>
> Rob
>
> -----Original Message----- From: Rob Griffin
> Sent: Wednesday, November 16, 2011 4:35 PM
> To: Dennis Murphy
> Cc: r-help at r-project.org
> Subject: Re: [R] create list of names where two df contain == values
>
> Ok, thanks for looking in to this so far, I seem to have confused you all a
> little though so I think I need to make this a bit clearer:
>
> in the real situation:
> df.1 is 271*13891, and contains (amongst others) columns with Flybase.CG,
> rMF, and Affyid values.
> df.2  is 14*12572 and is made from subset of df.1 which removed rows with
> duplicated Flybase.CG values, and df.2 also includes the rMF column
> because df.2 is made from the non-duplicated values it is shorter.
>
> I now need to put the Affyid column from df.1 in to df.2 -
>
> My idea is:
> to match a value on each row that is unique to that row (within column) but
> shared on both datasets - rMF contains such numbers
> then get R to copy the corresponding Affyid value (an alphanumeric id) from
> df.1 and place it in df.2$Affy (or at least in to a list which I could then
> put in to a column) with all "shared" rMF values and ignore all others
>
> for example df.1 and df.2 both contain the rMF value 0.3393211 which
> corresponds to the same data point which in df.1 has this Affyid: 1638273_at
>
> if you imagine the two rMF columns lined up next to each other they start
> the same and run in the same order, but df.2's has had "random" points
> removed as was the aim of making df.2, so as soon as you get to that point
> the rest of the list doesn't line up.
> What R needs to do is go down the df.2 rMF list one by one, and for each
> df.2 rMF check the entire df.1 rMF list for a match, then take the
> corresponding Affyid.
>
> for example df.1 and df.2 both contain the rMF value      0.3393211
> which corresponds to the same sample point which in df.1 has this
> Affyid: 1638273_at     but they occur on different rows in the data frame.
>
> is that a bit clearer? I know this is pretty complex.
>
> David, your idea with ifelse worked for the first few lines then as soon as
> it got to a point where one of the Flybase.CG values had been removed during
> the process of making df.2 it got out of line between the data frames and
> just gave NA after there.
>
>
> Rob
>
>
>
>
>
> -----Original Message----- From: Dennis Murphy
> Sent: Wednesday, November 16, 2011 4:03 PM
> To: Rob Griffin
> Cc: r-help at r-project.org
> Subject: Re: [R] create list of names where two df contain == values
>
> Hi:
>
> I think you're overthinking this problem. As is usually the case in R,
> a vectorized solution is clearer and provides more easily understood
> code.
>
> It's not obvious to me exactly what you want, so we'll try a couple of
> variations on the same idea. Equality of floating point numbers is a
> difficult computational problem (see R FAQ 7.31), but if it makes
> sense to define a threshold difference between floating numbers that
> practically equates to zero, then you're in business. In your example,
> the difference in numb1 for letter h in the two data frames is far
> from zero, so define 'equal' to be a difference < 10 ^{-6}. Then:
>
> # Return the entire matching data frame
> df.1[abs(df.1$numb1 - df.2$numb1) < 0.000001, ]
>  Letters     numb1 extra.col    id
> 1        a 0.3735462         1 CG234
> 2        b 1.1836433         2 CG232
> 3        c 0.1643714         3 CG441
> 4        d 2.5952808         4 CG128
> 5        e 1.3295078         5 CG125
> 6        f 0.1795316         6 CG182
> 7        g 1.4874291         7 CG982
> 9        i 1.5757814         9 CG282
> 10       j 0.6946116        10 CG154
>
> # Return the matching letters only as a vector:
> df.1[abs(df.1$numb1 - df.2$numb1) < 0.000001, 'Letters' ]
>
> If you want the latter object to remain a data frame, use drop = FALSE
> as an extra argument after 'Letters'. If you want to create a list
> object such that each letter comprises a different list component,
> then the following will do - the as.character() part coerces the
> factor Letters into a character object:
>
> as.list(as.character(df.1[abs(df.1$numb1 - df.2$numb1) < 0.000001,
>            'Letters' ]))
>
> HTH,
> Dennis
>
>
> On Wed, Nov 16, 2011 at 5:03 AM, Rob Griffin <robgriffin247 at hotmail.com>
> wrote:
>>
>> Hello again... sorry to be posting yet again, but I hadn't anticipated
>> this
>> problem.
>>
>> I am trying to now put the names found in one column in data frame 1 (lets
>> call it df.1[,1]) in to a list from the rows where the values in df.1[,2]
>> match values in a column of another dataframe (df.2[3])
>> I tried to write this function so that it put the list of names (called
>> Iffy) where the 2 criteria (df.1[141] and df.2[21]) matched but I think
>> its
>> too complex for a beginner R-enthusiast
>>
>> ify<-function(x,y,a,b,c) if(x[[,a]]==y[[,b]]) {list(x[[,c]])} else {NULL}
>> Iffy<-apply(  df.1,  1,  FUN=ify,  x=df.1,  y=df.2,  a=2,  b=3,  c=1  )
>>
>> But this didn't work... Error in FUN(newX[, i], ...) : unused argument(s)
>> (newX[, i])
>>
>>
>> Here is a dataset that replicates the problem, you'll notice the "h"
>> criteria values are different between the two dataframes and therefore it
>> would produce a list  of the 9 letters where the two criteria columns
>> matched (a,b,c,d,e,f,g,i,j):
>>
>>
>>
>> df.1<-data.frame(rep(letters[1:10]))
>> colnames(df.1)[1]<-("Letters")
>> set.seed(1)
>> df.1$numb1<-rnorm(10,1,1)
>> df.1$extra.col<-c(1,2,3,4,5,6,7,8,9,10)
>>
>> df.1$id<-c("CG234","CG232","CG441","CG128","CG125","CG182","CG982","CG541","CG282","CG154")
>> df.1
>>
>> df.2<-data.frame(rep(letters[1:10]))
>> colnames(df.2)[1]<-("Letters")
>> set.seed(1)
>> df.2$extra.col<-c(1,2,3,4,5,6,7,8,9,10)
>> df.2$numb1<-rnorm(10,1,1)
>>
>> df.2$id<-c("CG234","CG232","CG441","CG128","CG125","CG182","CG982","CG541","CG282","CG154")
>> df.2[8,3]<-12
>>
>> df.1
>> df.2
>>
>>
>>
>>
>> Your patience is much appreciated,
>> Rob
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list