[R] Problem merging data frames and duplicates

William Michels wjm1 at caa.columbia.edu
Wed May 27 15:27:08 CEST 2015


Hi Frank!

Ok, bind columns together in a state-wise fashion, allowing for state
duplicates. Below (maybe cheesy) uses the state abbreviations
"state.abb" in the datasets package. Also uses two functions
"rbind.na" and "cbind.na", available from Andrej-Nikolai Spiess'
website at:  http://www.dr-spiess.de . These functions are pretty
helpful in that they bind without recycling, inserting NAs instead.

by.state1 <- function(df1, df2) {
  df_out <- data.frame( state1=character(), locus1=numeric(),
state2=character(), locus2=numeric() )
  for(i in 1:50) {
    df_out_per <- cbind.na(df1[df1$state == state.abb[i], ],
df2[df2$state == state.abb[i], ])
    df_out <- rbind.na(df_out, df_out_per)
  }
  df_out
}

data.frame(state=c("IL", "IL", "LA","LA", "MS","MS", "AR", "AR"),
           locus=c(1,1,2,2,3,4,5,6)) -> x

##edit y to correct number of loci (6 not 7)
data.frame(state=c("IL", "IL", "AR", "AR", "TN","TN"),
           locus=c(1,1,2,3,3,4)) -> y

> by.state1(x,y)
   state locus state locus
7     AR     5    AR     2
8     AR     6    AR     3
3     IL     1    IL     1
4     IL     1    IL     1
31    LA     2  <NA>    NA
41    LA     2  <NA>    NA
5     MS     3  <NA>    NA
6     MS     4  <NA>    NA
9   <NA>    NA    TN     3
10  <NA>    NA    TN     4

The rows will be in state alphabetical order. If you need the row
numbers cleaned up as well (numeric order) you can pre-merge your data
to states.abb:

> a <- merge(state.abb, x, by= 1, all=F)
> b <- merge(state.abb, y, by= 1, all=F)
> colnames(a) <- c("state", "locus")
> colnames(b) <- c("state", "locus")
> by.state1(a,b)

   state locus state locus
1     AR     5    AR     2
2     AR     6    AR     3
3     IL     1    IL     1
4     IL     1    IL     1
5     LA     2  <NA>    NA
6     LA     2  <NA>    NA
7     MS     3  <NA>    NA
8     MS     4  <NA>    NA
9   <NA>    NA    TN     3
10  <NA>    NA    TN     4
>

Hope this helps,

Bill

William Michels, Ph.D.


On Wed, May 27, 2015 at 4:01 AM, Frank Burbrink <burbrink666 at gmail.com> wrote:
> I have figured out a cheesy work around since these problems have to do with
> not having unique identifiers for the States:
>
> 1) Append a unique identifier to each state such that both AR becomes ARa
> and ARb
> 2) run the normal merge(x,y,by=1, all=T)
> 3) Use subst to cut the appended identifiers.
>
> While this is clunky I can just write a function to do it all at once.
>
> On Wed, May 27, 2015 at 6:20 AM, Frank Burbrink <burbrink666 at gmail.com>
> wrote:
>>
>> Thanks Bill,
>>
>> However, unique(merge(x, y, by = 1, all=T)) is giving me:
>>
>>    state locus.x locus.y
>> 1     AR       5       2
>> 2     AR       5       3
>> 3     AR       6       2
>> 4     AR       6       3
>> 5     IL       1       1
>> 9     LA       2      NA
>> 11    MS       3      NA
>> 12    MS       4      NA
>> 13    TN      NA       3
>> 14    TN      NA       4
>>
>> This has AR repeated twice and the normal double IL and LA now only listed
>> singly.
>>
>> What I am hoping for is something like this:
>>
>>    state locus.x locus.y
>> 1     AR       5       2
>> 4     AR       6       3
>> 7     IL       1       1
>> 8     IL       1       1
>> 9     LA       2      NA
>> 10    LA       2      NA
>> 11    MS       3      NA
>> 12    MS       4      NA
>> 13    TN      NA       3
>> 14    TN      NA       4
>>
>> On Wed, May 27, 2015 at 3:53 AM, William Michels <wjm1 at caa.columbia.edu>
>> wrote:
>>>
>>> Hi Frank,
>>>
>>> It looks like you're very close. I think you want:
>>>
>>> unique(merge(x, y, by = 1, all=T))
>>>
>>> Gabor Grothendieck's sqldf package is very useful if you're more
>>> comfortable with SQL-type syntax, see:
>>>
>>> https://github.com/ggrothendieck/sqldf
>>>
>>> Best Regards,
>>>
>>> William (Bill) Michels, Ph.D.
>>>
>>>
>>>
>>> On Tue, May 26, 2015 at 5:12 PM, Frank Burbrink
>>> <burbrink666 at gmail.com> wrote: <SNIP>
>>
>>
>>
>>
>> --
>>
>> ***********************************
>> Frank T. Burbrink, Ph.D.
>> Professor
>> Biology Department
>> 6S-143
>> 2800 Victory Blvd.
>> College of Staten Island/CUNY
>> Staten Island, New York 10314
>> E-Mail:Frank.Burbrink at csi.cuny.edu
>> Phone:718-982-3961
>> Web Page: http://scholar.library.csi.cuny.edu/~fburbrink/
>> ***********************************
>> Chair
>> Ecology, Evolutionary Biology, and Behavior
>> Doctoral Subprogram
>> Biology Program
>> City University of New York
>> Graduate Center
>> 365 Fifth Avenue
>> New York, NY 10016-4309
>> **********************************
>
>
>
>
> --
>
> ***********************************
> Frank T. Burbrink, Ph.D.
> Professor
> Biology Department
> 6S-143
> 2800 Victory Blvd.
> College of Staten Island/CUNY
> Staten Island, New York 10314
> E-Mail:Frank.Burbrink at csi.cuny.edu
> Phone:718-982-3961
> Web Page: http://scholar.library.csi.cuny.edu/~fburbrink/
> ***********************************
> Chair
> Ecology, Evolutionary Biology, and Behavior
> Doctoral Subprogram
> Biology Program
> City University of New York
> Graduate Center
> 365 Fifth Avenue
> New York, NY 10016-4309
> **********************************



More information about the R-help mailing list