[R] Duplicate of columns when merging two data frames

Marc Schwartz marc_schwartz at me.com
Thu Mar 13 16:30:23 CET 2014


On Mar 13, 2014, at 10:19 AM, Stefano Sofia <stefano.sofia at regione.marche.it> wrote:

> Dear list users,
> I have two data frames df1 and df2, where the columns of df1 are
> 
> Sensor_RM Place_RM Station_RM Y_init_RM M_init_RM D_init_RM Y_fin_RM M_fin_RM D_fin_RM
> 
> and the columns of df2 are
> 
> Sensor_RM Station_RM Place_RM Province_RM Region_RM Net_init_RM GaussBoaga_EST_RM GaussBoaga_NORD_RM Gradi_Long_RM Primi_Long_RM Secondi_Long_RM Gradi_Lat_RM Primi_Lat_RM Secondi_Lat_RM Long_Cent_RM Lat_Cent_RM Height_RM
> 
> When I merge the two data frames through
> 
> df3 <- merge(df1, df2, by=c("Sensor_RM", "Station_RM"))
> 
> I get a new data frame with columns
> 
> Sensor_RM Station_RM Place_RM.x Y_init_RM M_init_RM D_init_RM Y_fin_RM M_fin_RM D_fin_RM Place_RM.y Province_RM Region_RM Net_init_RM GaussBoaga_EST_RM GaussBoaga_NORD_RM Gradi_Long_RM Primi_Long_RM Secondi_Long_RM Gradi_Lat_RM Primi_Lat_RM Secondi_Lat_RM Long_Cent_RM Lat_Cent_RM Height_RM
> 
> I am sure that df1$Place_RM and df2$Place_RM are equal. I checked it from the shell using awk and diff.
> Why then I have a duplicate of Place_RM, i.e. Place_RM.x and Place_RM.y, and only of them?
> 
> Thank you for your help
> Stefano
> 


From the Details section of ?merge:

"If the columns in the data frames not used in merging have any common names, these have suffixes (".x" and ".y" by default) appended to try to make the names of the result unique. If this is not possible, an error is thrown."


If you don't want both columns in the resultant data frame, use them in the 'by' argument or remove one of them prior to merge()ing. If you use them in the 'by' argument, be sure that they will be compared as exactly equal, which can be problematic if they are floating point values. If so, you would be better of subsetting one of the source data frames to remove the column first:

  df3 <- merge(df1, 
               subset(df2, select = -Place_RM),
               by=c("Sensor_RM", "Station_RM"))
  

Regards,

Marc Schwartz




More information about the R-help mailing list