[R] problems with merge() - the output has many repeated lines

Michael Dewey info at aghmed.fsnet.co.uk
Mon Aug 23 14:27:26 CEST 2010


At 18:23 22/08/2010, Cecilia Carmo wrote:
>I have done
>intersect(names(df1), names(df2))
>[1] "firm" "year"
>
>This is the key I used to merge
>merge(df1,df2,by=c("firm","year"))
>
>And there is just one row firm/year in df1 that 
>matches with another firm/year row in df2. Df1 
>has more firm/year rows than df2, and them don't match with none in df2.

That is what you believe but it seems that R disagrees.

I imagine the dataframes are too big to post so 
what I would try first is to create new 
dataframes containing just the variables firm and 
year (say newdf1 and newdf2), merge them and see 
whether I got the expected number of rows. If I 
did then I would add other variables back into 
the dataframe until the problem re-appeared.


>Cecília
>
>Em Sun, 22 Aug 2010 12:09:57 -0500
>  Erik Iverson <eriki at ccbr.umn.edu> escreveu:
>>Cecilia -
>>Find what columns you're matching on,
>>intersect(names(df1), names(df2)),
>>Maybe that will shed some light on the issue.
>>On 08/22/2010 12:02 PM, Cecilia Carmo wrote:
>>>Thanks, but I don't have multiple matches and the lines repeated in the
>>>final dataframe are exactly equal in all columns.
>>>
>>>Cecília
>>>
>>>Sat, 21 Aug 2010 10:58:53 -0500
>>>Hadley Wickham <hadley at rice.edu> escreveu:
>>>>You may find a close reading of ?merge helpful, particularly this
>>>>sentence: "If there is more than one match, all possible
>>>>matches contribute one row each" (so check that you don't have
>>>>multiple matches).
>>>>
>>>>Hadley
>>>>
>>>>On Sat, Aug 21, 2010 at 10:45 AM, Cecilia Carmo <cecilia.carmo at ua.pt>
>>>>wrote:
>>>>>Hi everyone,
>>>>>
>>>>>I have been merging many big dataframes (about 80000 rows each) and I
>>>>>never
>>>>>had this problem, but now it happened to me and I want to know if
>>>>>someone
>>>>>knows what could be happening.
>>>>>The final dataframe has many rows, an impossible number! I have done
>>>>>edit(dataframe) and I saw that there are many repeated rows (all equal).
>>>>>
>>>>>Thanks for any help,
>>>>>
>>>>>Cecília Carmo
>>>>>Universidade de Aveiro
>>>>>Portugal
>>>>>
>>>>>______________________________________________
>>>>>R-help at r-project.org mailing list
>>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>PLEASE do read the posting guide
>>>>>http://www.R-project.org/posting-guide.html
>>>>>and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>>
>>>>--
>>>>Assistant Professor / Dobelman Family Junior Chair
>>>>Department of Statistics / Rice University
>>>>http://had.co.nz/
>>>
>>>______________________________________________
>>>R-help at r-project.org mailing list
>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>PLEASE do read the posting guide
>>>http://www.R-project.org/posting-guide.html
>>>and provide commented, minimal, self-contained, reproducible code.
>
>

Michael Dewey
http://www.aghmed.fsnet.co.uk



More information about the R-help mailing list