[R] merge: right set overwrite left set

Ista Zahn istazahn at gmail.com
Sun Jul 12 19:06:08 CEST 2015


I think this does what you want:

## find idiv coloumns in x.HHu.map that don't exist in y.HHo.map
x.HHu.map <- x.HHu.map[
    c("HHid",
      "position",
      names(x.HHu.map)[
               !names(x.HHu.map)
               %in% names(y.HHo.map)]
      )]
## merge, adding extra column from x.HHu.map
zzz <- merge(y.HHo.map, x.HHu.map, by=c('HHid', 'position'), all=T)
## order by HHid
zzz <- zzz[order(zzz$HHid),]

Best,
Ista

On Sun, Jul 12, 2015 at 10:45 AM, aldi <aldi at dsgmail.wustl.edu> wrote:
> Hi,
> I have two sets of data x.HHu and y.HHo, rows are IDs and columns are
> individuals. I do not know in advance indv or HHid, both of them will be
> captured from the data. As the y.HHo set updates, y.HHo set has better
> information then x.HHu set. Thus I want a merge where right set
> overwrites left set info based on HHid, i.e. to overwrite x.HHu set with
> y.HHo set but keep any extra info from the x.HHu set that is not present
> in y.HHo set.
> HHids will be complete based on z.map, with the corresponding positions.
> I am having trouble with the part after this line: ###
> ============================================+++++++++++++++++++++++++++
> I am thinking that I am creating new columns "position" "indv1" and
> "indv2", but R is interpreting them as row information.
> See the expected final table at the end. HHid is common, indv3 is from
> x.HHu, and the rest position and indv1 and indv2 are from y.HHo
> Any suggestions are appreciated.
> Thank you in advance,
> Aldi
>
> x.HHu<- data.frame(
>             HHid = c( 'HH1', 'HH2', 'HH3', 'HH4', 'HH5', 'HH10')
>           , indv1 = c( 2, 0, 2 , 0, 2, 0)
>           , indv2 = c( 0, NA, 2, 2, 2, 2)
>           , ind3 = c( 0, 0, 0, 0, 0, 0)
>           )
> ### the HHo data will be the top set to overwrite any HHu data, when
> they exist, thinking that HHo are better than HHu results
> ### when they are available
>
> y.HHo<-data.frame(HHid=c('HH1', 'HH2','HH5', 'HH3', 'HH10')
>           , indv1 = c(2, 0, 2, 0, NA)
>           , indv2 = c(0, 2, 2, 1, 2)
>           )
>
> z.map<-data.frame(HHid = c('HH1', 'HH2', 'HH3', 'HH4', 'HH5',
> 'HH6','HH8', 'HH7', 'HH9', 'HH10', 'HH11')
>                  , position= c(10,20,30,42,55,66,81,75,92,101,111)
>                  )
> ### see objects
> x.HHu
> y.HHo
> z.map
> ### now sort the map by position, this sorted map will be used to sort
> finally all data
> z.map<-z.map[with(z.map, order(position)), ]
> z.map
>
> ### First I introduce position to both sets so I can sort them in
> advance by position.
> x.HHu.map <-merge( z.map, x.HHu, by='HHid', all=T)
> x.HHu.map<-x.HHu.map[with(x.HHu.map, order(position)), ]
> x.HHu.map
>
> y.HHo.map <-merge( z.map, y.HHo, by='HHid', all= T)
> y.HHo.map<-y.HHo.map[with(y.HHo.map, order(position)), ]
> y.HHo.map
>
> ### now merge HHu  and HHo  with the hope to overwrite the HHu set with
> HHo wherever they overlap by column names.
> zzz <- merge(x.HHu.map, y.HHo.map, by='HHid', all=T)
> zzz
> ### find common variable names in two sets
>
> commonNames <- names(x.impu.map)[which(colnames(x.impu.map) %in%
> colnames(y.geno.map))]
>
> ## remove HHid wich is common for x and y, but work with the rest of columns
> commonNames<-commonNames[-c(1)]
>
> ### ============================================+++++++++++++++++++++++++++
> for(i in 1:length(commonNames)){
>
> print(commonNames[i])
> zzz$commonNames[i] <- NA
>
> print(paste("zzz","$",commonNames[i],".y",sep=""))
>
> zzz$commonNames[i] <- zzz[,paste(commonNames[i],".y",sep="")]
>
> ### paste(zzz$commonNames[i],".x",sep='') <- NULL;
> ### paste(zzz$commonNames[i],".y",sep='') <- NULL;
>
> }
> zzz
>
> The final expected set has to be: HHid is common, indv3 is from x.HHu,
> and the rest position and indv1 and indv2 are from y.HHo
>     HHid     position     ind3  indv1 indv2
> 1   HH1         10          0     2       0
> 2  HH10        101          0    NA       2
> 3  HH11        111         NA    NA      NA
> 4   HH2         20          0     0       2
> 5   HH3         30          0     0       1
> 6   HH4         42          0    NA      NA
> 7   HH5         55          0     2       2
> 8   HH6         66         NA    NA      NA
> 9   HH7         75         NA    NA      NA
> 10  HH8         81         NA    NA      NA
> 11  HH9         92         NA    NA      NA
>
> --
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list