[R] Problem merging data frames and duplicates

Frank Burbrink burbrink666 at gmail.com
Wed May 27 15:42:59 CEST 2015


Interesting solutions. Thanks guys!

On Wed, May 27, 2015 at 9:27 AM, William Michels <wjm1 at caa.columbia.edu>
wrote:

> Hi Frank!
>
> Ok, bind columns together in a state-wise fashion, allowing for state
> duplicates. Below (maybe cheesy) uses the state abbreviations
> "state.abb" in the datasets package. Also uses two functions
> "rbind.na" and "cbind.na", available from Andrej-Nikolai Spiess'
> website at:  http://www.dr-spiess.de . These functions are pretty
> helpful in that they bind without recycling, inserting NAs instead.
>
> by.state1 <- function(df1, df2) {
>   df_out <- data.frame( state1=character(), locus1=numeric(),
> state2=character(), locus2=numeric() )
>   for(i in 1:50) {
>     df_out_per <- cbind.na(df1[df1$state == state.abb[i], ],
> df2[df2$state == state.abb[i], ])
>     df_out <- rbind.na(df_out, df_out_per)
>   }
>   df_out
> }
>
> data.frame(state=c("IL", "IL", "LA","LA", "MS","MS", "AR", "AR"),
>            locus=c(1,1,2,2,3,4,5,6)) -> x
>
> ##edit y to correct number of loci (6 not 7)
> data.frame(state=c("IL", "IL", "AR", "AR", "TN","TN"),
>            locus=c(1,1,2,3,3,4)) -> y
>
> > by.state1(x,y)
>    state locus state locus
> 7     AR     5    AR     2
> 8     AR     6    AR     3
> 3     IL     1    IL     1
> 4     IL     1    IL     1
> 31    LA     2  <NA>    NA
> 41    LA     2  <NA>    NA
> 5     MS     3  <NA>    NA
> 6     MS     4  <NA>    NA
> 9   <NA>    NA    TN     3
> 10  <NA>    NA    TN     4
>
> The rows will be in state alphabetical order. If you need the row
> numbers cleaned up as well (numeric order) you can pre-merge your data
> to states.abb:
>
> > a <- merge(state.abb, x, by= 1, all=F)
> > b <- merge(state.abb, y, by= 1, all=F)
> > colnames(a) <- c("state", "locus")
> > colnames(b) <- c("state", "locus")
> > by.state1(a,b)
>
>    state locus state locus
> 1     AR     5    AR     2
> 2     AR     6    AR     3
> 3     IL     1    IL     1
> 4     IL     1    IL     1
> 5     LA     2  <NA>    NA
> 6     LA     2  <NA>    NA
> 7     MS     3  <NA>    NA
> 8     MS     4  <NA>    NA
> 9   <NA>    NA    TN     3
> 10  <NA>    NA    TN     4
> >
>
> Hope this helps,
>
> Bill
>
> William Michels, Ph.D.
>
>
> On Wed, May 27, 2015 at 4:01 AM, Frank Burbrink <burbrink666 at gmail.com>
> wrote:
> > I have figured out a cheesy work around since these problems have to do
> with
> > not having unique identifiers for the States:
> >
> > 1) Append a unique identifier to each state such that both AR becomes ARa
> > and ARb
> > 2) run the normal merge(x,y,by=1, all=T)
> > 3) Use subst to cut the appended identifiers.
> >
> > While this is clunky I can just write a function to do it all at once.
> >
> > On Wed, May 27, 2015 at 6:20 AM, Frank Burbrink <burbrink666 at gmail.com>
> > wrote:
> >>
> >> Thanks Bill,
> >>
> >> However, unique(merge(x, y, by = 1, all=T)) is giving me:
> >>
> >>    state locus.x locus.y
> >> 1     AR       5       2
> >> 2     AR       5       3
> >> 3     AR       6       2
> >> 4     AR       6       3
> >> 5     IL       1       1
> >> 9     LA       2      NA
> >> 11    MS       3      NA
> >> 12    MS       4      NA
> >> 13    TN      NA       3
> >> 14    TN      NA       4
> >>
> >> This has AR repeated twice and the normal double IL and LA now only
> listed
> >> singly.
> >>
> >> What I am hoping for is something like this:
> >>
> >>    state locus.x locus.y
> >> 1     AR       5       2
> >> 4     AR       6       3
> >> 7     IL       1       1
> >> 8     IL       1       1
> >> 9     LA       2      NA
> >> 10    LA       2      NA
> >> 11    MS       3      NA
> >> 12    MS       4      NA
> >> 13    TN      NA       3
> >> 14    TN      NA       4
> >>
> >> On Wed, May 27, 2015 at 3:53 AM, William Michels <wjm1 at caa.columbia.edu
> >
> >> wrote:
> >>>
> >>> Hi Frank,
> >>>
> >>> It looks like you're very close. I think you want:
> >>>
> >>> unique(merge(x, y, by = 1, all=T))
> >>>
> >>> Gabor Grothendieck's sqldf package is very useful if you're more
> >>> comfortable with SQL-type syntax, see:
> >>>
> >>> https://github.com/ggrothendieck/sqldf
> >>>
> >>> Best Regards,
> >>>
> >>> William (Bill) Michels, Ph.D.
> >>>
> >>>
> >>>
> >>> On Tue, May 26, 2015 at 5:12 PM, Frank Burbrink
> >>> <burbrink666 at gmail.com> wrote: <SNIP>
> >>
> >>
> >>
> >>
> >> --
> >>
> >> ***********************************
> >> Frank T. Burbrink, Ph.D.
> >> Professor
> >> Biology Department
> >> 6S-143
> >> 2800 Victory Blvd.
> >> College of Staten Island/CUNY
> >> Staten Island, New York 10314
> >> E-Mail:Frank.Burbrink at csi.cuny.edu
> >> Phone:718-982-3961
> >> Web Page: http://scholar.library.csi.cuny.edu/~fburbrink/
> >> ***********************************
> >> Chair
> >> Ecology, Evolutionary Biology, and Behavior
> >> Doctoral Subprogram
> >> Biology Program
> >> City University of New York
> >> Graduate Center
> >> 365 Fifth Avenue
> >> New York, NY 10016-4309
> >> **********************************
> >
> >
> >
> >
> > --
> >
> > ***********************************
> > Frank T. Burbrink, Ph.D.
> > Professor
> > Biology Department
> > 6S-143
> > 2800 Victory Blvd.
> > College of Staten Island/CUNY
> > Staten Island, New York 10314
> > E-Mail:Frank.Burbrink at csi.cuny.edu
> > Phone:718-982-3961
> > Web Page: http://scholar.library.csi.cuny.edu/~fburbrink/
> > ***********************************
> > Chair
> > Ecology, Evolutionary Biology, and Behavior
> > Doctoral Subprogram
> > Biology Program
> > City University of New York
> > Graduate Center
> > 365 Fifth Avenue
> > New York, NY 10016-4309
> > **********************************
>



-- 

*************************************
*Frank T. Burbrink, Ph.D.*
*Professor*
*Biology Department*
*6S-143*
*2800 Victory Blvd.*
*College of Staten Island/CUNY*
*Staten Island, New York 10314*
*E-Mail:Frank.Burbrink at csi.cuny.edu <E-Mail%3AFrank.Burbrink at csi.cuny.edu>*
*Phone:718-982-3961*
*Web Page: http://scholar.library.csi.cuny.edu/~fburbrink/
<http://scholar.library.csi.cuny.edu/%7Efburbrink/>*
*************************************
*Chair *
*Ecology, Evolutionary Biology, and Behavior*
*Doctoral Subprogram*
*Biology Program*
*City University of New York *
*Graduate Center*
*365 Fifth Avenue*
*New York, NY 10016-4309*
************************************

	[[alternative HTML version deleted]]



More information about the R-help mailing list