[R] Merging dataframes

Sarmah, Chintanu Ch|nt@nu@S@rm@h @end|ng |rom @|@@com
Wed May 2 06:16:46 CEST 2018


Thanks, Peter, Eivind and Lui

Sorry, I could not explain it properly in the first go. Trying to simplify it here with an example - Say I have two dataframes as below that are not equally-sized data frames:

Table_A:
Email             Name                   Phone
abc using gmail.com<mailto:abc using gmail.com>   John Chan         0909
bcd using yahoo.com<mailto:bcd using yahoo.com>   Tim Ma                    89089
......

Table_B:
Email              Name                 Sex        Phone
abc using gmail.com<mailto:abc using gmail.com>    John Chan        M                 0909
khn using hotmail.com<mailto:khn using hotmail.com>           Rosy  M               F                   7779
.....

Now, I have used -
merge (Table_A, Table_B, by="Email", all = FALSE))

- to find only the rows that match from these data frames.

Further, I am also interested (using "Email" as the common key) which rows from Table_A did not match with Table_B.
I am not sure how to do here.

 Thanks.


On 1 May 2018, at 9:35 pm, Chintanu <chintanu using gmail.com<mailto:chintanu using gmail.com>> wrote:


---------- Forwarded message ----------
From: peter dalgaard <pdalgd using gmail.com<mailto:pdalgd using gmail.com>>
Date: Tue, May 1, 2018 ar-help using r-project.org<mailto:r-help using r-project.org>t 9:05 PM
Subject: Re: [R] Merging dataframes
To: Rui Barradas <ruipbarradas using sapo.pt<mailto:ruipbarradas using sapo.pt>>
Cc: Chintanu <chintanu using gmail.com<mailto:chintanu using gmail.com>>, R help <r-help using r-project.org<mailto:r-help using r-project.org>>


I'd expect more like

setdiff(A$key, B$key)

and vice versa. Or, if you want the actual rows

A[!(A$key %in% B$key),]

or for the row numbers

which(!(A$key %in% B$key))


-pd




> On 1 May 2018, at 12:48 , Rui Barradas <ruipbarradas using sapo.pt<mailto:ruipbarradas using sapo.pt>> wrote:
>
> Hello,
>
> Is it something like this that you want?
>
> x <- data.frame(a = c(1:3, 5, 5:10), b = c(1:7, 7, 9:10))
> y <- data.frame(a = 1:10, b = 1:10)
>
> which(x != y, arr.ind = TRUE)
>
>
> Hope this helps,
>
> Rui Barradas
>
> On 5/1/2018 11:35 AM, Chintanu wrote:
>> Hi,
>> May I please ask how I do the following in R. Sorry - this may be trivial,
>> but I am struggling here for this.
>> For two dataframes (A and B), I wish to identify (based on a primary
>> key-column present in both A & B) -
>> 1. Which records (rows) of A did not match with B, and
>> 2. Which records of B did not match with A ?
>> I came across a setdt function while browsing, but when I tried it, it says
>> - Could not find function "setdt".
>> Overall, if there is any way of doing it (preferably in some simplified
>> way), please advise.
>> Many thanks in advance.
>> regards,
>> Tito
>>      [[alternative HTML version deleted]]
>> ______________________________________________
>> R-help using r-project.org<mailto:R-help using r-project.org> mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help<https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwMFaQ&c=8acizdFhFtEZX1sSgfHPrQ&r=BxjHz6j-Vy7scfJA1zHGhv3tYe2rG8R184kyAJS45dM&m=jMuWrLr4CoKyPhQHA8AN6zWVm7gVs8LF6UCNstNRRAQ&s=lfIRPP8CRcCepiCqApPDf7wZsVTrG9O2Lt8rByESWFI&e=>
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwMFaQ&c=8acizdFhFtEZX1sSgfHPrQ&r=BxjHz6j-Vy7scfJA1zHGhv3tYe2rG8R184kyAJS45dM&m=jMuWrLr4CoKyPhQHA8AN6zWVm7gVs8LF6UCNstNRRAQ&s=rW2b2LomxW9-0O0Tb34jnePsC_tX-3CpadlJWt9ikQc&e=>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help using r-project.org<mailto:R-help using r-project.org> mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help<https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwMFaQ&c=8acizdFhFtEZX1sSgfHPrQ&r=BxjHz6j-Vy7scfJA1zHGhv3tYe2rG8R184kyAJS45dM&m=jMuWrLr4CoKyPhQHA8AN6zWVm7gVs8LF6UCNstNRRAQ&s=lfIRPP8CRcCepiCqApPDf7wZsVTrG9O2Lt8rByESWFI&e=>
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwMFaQ&c=8acizdFhFtEZX1sSgfHPrQ&r=BxjHz6j-Vy7scfJA1zHGhv3tYe2rG8R184kyAJS45dM&m=jMuWrLr4CoKyPhQHA8AN6zWVm7gVs8LF6UCNstNRRAQ&s=rW2b2LomxW9-0O0Tb34jnePsC_tX-3CpadlJWt9ikQc&e=>
> and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes using cbs.dk<mailto:pd.mes using cbs.dk>  Priv: PDalgd using gmail.com<mailto:PDalgd using gmail.com>










IMPORTANT NOTICE: The information in this email (and any attachments) is confidential. If you are not the intended recipient, you must not use or disseminate the information. If you have received this email in error, please immediately notify me by "Reply" command and permanently delete the original and any copies or printouts thereof. Although this email and any attachments are believed to be free of any virus or other defect that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is virus free and no responsibility is accepted by AIA Group Limited or its subsidiaries or affiliates either jointly or severally, for any loss or damage arising in any way from its use.

	[[alternative HTML version deleted]]



More information about the R-help mailing list