[R] merge: right set overwrite left set

Mon Jul 13 13:23:00 CEST 2015

Thank you Ista,
Your solution is smart, by sub-setting from x.HHu.map data only "HHid", 
"position" as indices (because they are unique) for the merge, and any 
extra columns in x.HHu.map that are not present in y.HHo,map, thus when 
the merge is done with option all=T, will work among the two sets of 
data, without creating .x and .y when the variables are in common in two 
sets.

With best wishes,
Aldi

 > ## find indv columns in x.HHu.map that don't exist in y.HHo.map
 > x.HHu.map <- x.HHu.map[
+     c("HHid",
+       "position",
+       names(x.HHu.map)[
+                !names(x.HHu.map)
+                %in% names(y.HHo.map)]
+       )]
 > ## merge, adding extra column from x.HHu.map
 > zzz <- merge(y.HHo.map, x.HHu.map, by=c('HHid', 'position'), all=T)
 > ## order by HHid
 > zzz <- zzz[order(zzz$HHid),]
 > zzz
    HHid position indv1 indv2 ind3
1   HH1       10     2     0    0
2  HH10      101    NA     2    0
3  HH11      111    NA    NA   NA
4   HH2       20     0     2    0
5   HH3       30     0     1    0
6   HH4       42    NA    NA    0
7   HH5       55     2     2    0
8   HH6       66    NA    NA   NA
9   HH7       75    NA    NA   NA
10  HH8       81    NA    NA   NA
11  HH9       92    NA    NA   NA

On 7/12/2015 12:06 PM, Ista Zahn wrote:
> I think this does what you want:
>
> ## find idiv coloumns in x.HHu.map that don't exist in y.HHo.map
> x.HHu.map <- x.HHu.map[
>      c("HHid",
>        "position",
>        names(x.HHu.map)[
>                 !names(x.HHu.map)
>                 %in% names(y.HHo.map)]
>        )]
> ## merge, adding extra column from x.HHu.map
> zzz <- merge(y.HHo.map, x.HHu.map, by=c('HHid', 'position'), all=T)
> ## order by HHid
> zzz <- zzz[order(zzz$HHid),]
>
> Best,
> Ista
>
> On Sun, Jul 12, 2015 at 10:45 AM, aldi <aldi at dsgmail.wustl.edu> wrote:
>> Hi,
>> I have two sets of data x.HHu and y.HHo, rows are IDs and columns are
>> individuals. I do not know in advance indv or HHid, both of them will be
>> captured from the data. As the y.HHo set updates, y.HHo set has better
>> information then x.HHu set. Thus I want a merge where right set
>> overwrites left set info based on HHid, i.e. to overwrite x.HHu set with
>> y.HHo set but keep any extra info from the x.HHu set that is not present
>> in y.HHo set.
>> HHids will be complete based on z.map, with the corresponding positions.
>> I am having trouble with the part after this line: ###
>> ============================================+++++++++++++++++++++++++++
>> I am thinking that I am creating new columns "position" "indv1" and
>> "indv2", but R is interpreting them as row information.
>> See the expected final table at the end. HHid is common, indv3 is from
>> x.HHu, and the rest position and indv1 and indv2 are from y.HHo
>> Any suggestions are appreciated.
>> Thank you in advance,
>> Aldi
>>
>> x.HHu<- data.frame(
>>              HHid = c( 'HH1', 'HH2', 'HH3', 'HH4', 'HH5', 'HH10')
>>            , indv1 = c( 2, 0, 2 , 0, 2, 0)
>>            , indv2 = c( 0, NA, 2, 2, 2, 2)
>>            , ind3 = c( 0, 0, 0, 0, 0, 0)
>>            )
>> ### the HHo data will be the top set to overwrite any HHu data, when
>> they exist, thinking that HHo are better than HHu results
>> ### when they are available
>>
>> y.HHo<-data.frame(HHid=c('HH1', 'HH2','HH5', 'HH3', 'HH10')
>>            , indv1 = c(2, 0, 2, 0, NA)
>>            , indv2 = c(0, 2, 2, 1, 2)
>>            )
>>
>> z.map<-data.frame(HHid = c('HH1', 'HH2', 'HH3', 'HH4', 'HH5',
>> 'HH6','HH8', 'HH7', 'HH9', 'HH10', 'HH11')
>>                   , position= c(10,20,30,42,55,66,81,75,92,101,111)
>>                   )
>> ### see objects
>> x.HHu
>> y.HHo
>> z.map
>> ### now sort the map by position, this sorted map will be used to sort
>> finally all data
>> z.map<-z.map[with(z.map, order(position)), ]
>> z.map
>>
>> ### First I introduce position to both sets so I can sort them in
>> advance by position.
>> x.HHu.map <-merge( z.map, x.HHu, by='HHid', all=T)
>> x.HHu.map<-x.HHu.map[with(x.HHu.map, order(position)), ]
>> x.HHu.map
>>
>> y.HHo.map <-merge( z.map, y.HHo, by='HHid', all= T)
>> y.HHo.map<-y.HHo.map[with(y.HHo.map, order(position)), ]
>> y.HHo.map
>>
>> ### now merge HHu  and HHo  with the hope to overwrite the HHu set with
>> HHo wherever they overlap by column names.
>> zzz <- merge(x.HHu.map, y.HHo.map, by='HHid', all=T)
>> zzz
>> ### find common variable names in two sets
>>
>> commonNames <- names(x.impu.map)[which(colnames(x.impu.map) %in%
>> colnames(y.geno.map))]
>>
>> ## remove HHid wich is common for x and y, but work with the rest of columns
>> commonNames<-commonNames[-c(1)]
>>
>> ### ============================================+++++++++++++++++++++++++++
>> for(i in 1:length(commonNames)){
>>
>> print(commonNames[i])
>> zzz$commonNames[i] <- NA
>>
>> print(paste("zzz","$",commonNames[i],".y",sep=""))
>>
>> zzz$commonNames[i] <- zzz[,paste(commonNames[i],".y",sep="")]
>>
>> ### paste(zzz$commonNames[i],".x",sep='') <- NULL;
>> ### paste(zzz$commonNames[i],".y",sep='') <- NULL;
>>
>> }
>> zzz
>>
>> The final expected set has to be: HHid is common, indv3 is from x.HHu,
>> and the rest position and indv1 and indv2 are from y.HHo
>>      HHid     position     ind3  indv1 indv2
>> 1   HH1         10          0     2       0
>> 2  HH10        101          0    NA       2
>> 3  HH11        111         NA    NA      NA
>> 4   HH2         20          0     0       2
>> 5   HH3         30          0     0       1
>> 6   HH4         42          0    NA      NA
>> 7   HH5         55          0     2       2
>> 8   HH6         66         NA    NA      NA
>> 9   HH7         75         NA    NA      NA
>> 10  HH8         81         NA    NA      NA
>> 11  HH9         92         NA    NA      NA
>>
>> --
>>
>>

	[[alternative HTML version deleted]]