[R] How to locate the difference from two data frames

David Winsemius dwinsemius at comcast.net
Thu Apr 8 22:20:00 CEST 2010


On Apr 8, 2010, at 4:03 PM, Jun Shen wrote:

> David,
>
> all.equal() only tells how many mismatches there are including  
> missing values but it doesn't tell me the location of each mismatch.

Yes, I noticed that after further testing. I agree Charles' solution  
is more informative and I wonder if it could be added to the  
functionality of all.equal (which purports to tell the user where  
objects differ)?

>
> For example, if I have one NA mismatch and three numerical mismatches,
>
> all.equal(a,b) gives
> [1] "Component 2: 'is.NA' value mismatch: 1 in current 0 in target"
> [2] "Component 3: 3 string mismatches"
> This only tells the missing value mismatch is in the second column  
> (component) and 3 numerical mismatches in the third column. But no  
> row information
>
> which(mapply(identical,unlist(a),unlist(b))==FALSE) gives
> TIME5   DV1   DV2  DV17
>    85   161   162   177
> It tells me exactly which columns and rows to have the mismatches.  
> In this case is column "TIME" row 5 and column "DV" rows 1, 2 and  
> 17. You can ignore the serial numbers that followed.
>
> Jun
>
> On Thu, Apr 8, 2010 at 1:58 PM, David Winsemius <dwinsemius at comcast.net 
> > wrote:
>
> On Apr 8, 2010, at 1:34 PM, Jun Shen wrote:
>
> David,
>
> Thanks for the suggestion. Now I have worked out a general solution.
>
> Assume "a" and "b" are two data frames with same dimensions
>
> 1. Call identical(a,b) to get an overall assessment. If you get a  
> FALSE
> 2. Call which(mapply(identical,unlist(a),unlist(b))==FALSE), you  
> will get a result like
>    TIME5
>     85
> which means, the row 5 and the column with name "TIME" is different.  
> This also works for missing values. Thanks for everyone.
>
> Looks that all.equal is already set up to provide such a service:
>
> > all.equal(df1,df2)
> [1] "Component 1: 'is.NA' value mismatch: 1 in current 0 in target"
>
> I was under the misimpression that all.equal was for approximate  
> equality of numeric values but that only appears to be part of its  
> design.
>
> -- 
> David.
>
>
>
> Jun Shen from Millipore
>
> On Thu, Apr 8, 2010 at 9:08 AM, David Winsemius <dwinsemius at comcast.net 
> > wrote:
>
> On Apr 8, 2010, at 9:47 AM, Jun Shen wrote:
>
> Dear David, Erik and Charles,
>
> Thank you for your input. Both mapply() and which() can do the job.  
> Just one
> exception. If there is a missing value as NA in the data frame "a"  
> and a
> data point (either numerical or character) in the corresponding  
> position of
> "b", then mapply() only returns NA for that position rather than  
> "FALSE",
> and which() cannot pick up that position either. Thanks again.
>
>
> You seem to have changed the programming challenge from  
> identification to replicating identical(). If so then you can get  
> closer with wrapping isTRUE(all() around the mapply("==" ,  
> attributes( ...), ...)  step,  and wrap the "==" call in  
> isTRUE(all(.))
>
> > isTRUE(all(mapply("==", df1, df2)) )
> [1] FALSE  since all(c(NA, TRUE, TRUE)) == NA and isTRUE(NA) == FALSE
>
> -- 
> David.
>
>
>
>
> Jun
>
> On Wed, Apr 7, 2010 at 10:46 PM, Charles C. Berry <cberry at tajo.ucsd.edu 
> >wrote:
>
> On Wed, 7 Apr 2010, Jun Shen wrote:
>
> Dear all,
>
> I understand identical (a,b) will tell me if a and b are exactly the  
> same
> or
> not. But what if they are different, is there anyway to tell which
> element(s) are different? Thanks.
>
>
> which( a != b, arr.ind = TRUE)
>
> HTH,
>
> Chuck
>
>
> Jun
>
>
>     [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> Charles C. Berry                            (858) 534-2098
>                                         Dept of Family/Preventive
> Medicine
> E mailto:cberry at tajo.ucsd.edu               UC San Diego
> http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego  
> 92093-0901
>
>
>
>
>       [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
>
>
> David Winsemius, MD
> West Hartford, CT
>
>

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list