[R] Newbie wants to compare 2 huge RDSs row by row.

Ulrik Stervbo ulrik.stervbo at gmail.com
Sun Jan 28 09:17:40 CET 2018


The anti_join from the package dplyr might also be handy.

install.package("dplyr")
library(dplyr)
anti_join (x1, x2)

You can get help on the different functions by ?function.name(), so
?anti_join() will bring you help - and examples - on the anti_join
function.

It might be worth testing your approach on a small subset of the data. That
makes it easier for you to follow what happens and evaluate the outcome.

HTH
Ulrik

Marsh Hardy ARA/RISK <mhardy at ara.com> schrieb am So., 28. Jan. 2018, 04:14:

> Cool, looks like that'd do it, almost as if converting an entire record to
> a character string and comparing strings.
>
>   --  M. B. Hardy, statistician
> work: Applied Research Associates, S. E. Div.
>       8537 Six Forks Rd., # 6000 / Raleigh, NC 27615
> <https://maps.google.com/?q=8537+Six+Forks+Rd.,+%23+6000+/+Raleigh,+NC+27615&entry=gmail&source=g>
> -2963
>       (919) 582-3329, fax: 582-3301
> home: 1020 W. South St. / Raleigh, NC 27603
> <https://maps.google.com/?q=1020+W.+South+St.+/+Raleigh,+NC+27603&entry=gmail&source=g>
> -2162
>       (919) 834-1245
> ________________________________________
> From: William Dunlap [wdunlap at tibco.com]
> Sent: Saturday, January 27, 2018 4:57 PM
> To: Marsh Hardy ARA/RISK
> Cc: Ulrik Stervbo; Eric Berger; r-help at r-project.org
> Subject: Re: [R] Newbie wants to compare 2 huge RDSs row by row.
>
> If your two objects have class "data.frame" (look at class(objectName))
> and they
> both have the same number of columns and the same order of columns and the
> column types match closely enough (use all.equal(x1, x2) for that), then
> you can try
>      which( rowSums( x1 != x2 ) > 0)
> E.g.,
> > x1 <- data.frame(X=1:5, Y=rep(c("A","B"),c(3,2)))
> > x2 <- data.frame(X=c(1,2,-3,-4,5), Y=rep(c("A","B"),c(2,3)))
> > x1
>   X Y
> 1 1 A
> 2 2 A
> 3 3 A
> 4 4 B
> 5 5 B
> > x2
>    X Y
> 1  1 A
> 2  2 A
> 3 -3 B
> 4 -4 B
> 5  5 B
> > which( rowSums( x1 != x2 ) > 0)
> [1] 3 4
>
> If you want to allow small numeric differences but exactly character
> matches
> you will have to get a bit fancier.  Splitting the data.frames into
> character and
> numeric parts and comparing each works well.
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com<http://tibco.com>
>
> On Sat, Jan 27, 2018 at 1:18 PM, Marsh Hardy ARA/RISK <mhardy at ara.com
> <mailto:mhardy at ara.com>> wrote:
> Hi Guys, I apologize for my rank & utter newness at R.
>
> I used summary() and found about 95 variables, both character and numeric,
> all with "Length:368842" I assume is the # of records.
>
> I'd like to know the record number (row #?) of any record where the data
> doesn't match in the 2 files of what should be the same output.
>
> Thanks in advance, M.
>
> //
> ________________________________________
> From: Ulrik Stervbo [ulrik.stervbo at gmail.com<mailto:
> ulrik.stervbo at gmail.com>]
> Sent: Saturday, January 27, 2018 10:00 AM
> To: Eric Berger
> Cc: Marsh Hardy ARA/RISK; r-help at r-project.org<mailto:r-help at r-project.org
> >
> Subject: Re: [R] Newbie wants to compare 2 huge RDSs row by row.
>
> Also, it will be easier to provide helpful information if you'd describe
> what in your data you want to compare and what you hope to get out of the
> comparison.
>
> Best wishes,
> Ulrik
>
> Eric Berger <ericjberger at gmail.com<mailto:ericjberger at gmail.com><mailto:
> ericjberger at gmail.com<mailto:ericjberger at gmail.com>>> schrieb am Sa., 27.
> Jan. 2018, 08:18:
> Hi Marsh,
> An RDS is not a data structure such as a data.frame. It can be anything.
> For example if I want to save my objects a, b, c I could do:
> > saveRDS( list(a,b,c,), file="tmp.RDS")
> Then read them back later with
> > myList <- readRDS( "tmp.RDS" )
>
> Do you have additional information about your "RDSs" ?
>
> Eric
>
>
> On Sat, Jan 27, 2018 at 6:54 AM, Marsh Hardy ARA/RISK <mhardy at ara.com
> <mailto:mhardy at ara.com><mailto:mhardy at ara.com<mailto:mhardy at ara.com>>>
> wrote:
>
> > Each RDS is 40 MBs. What's a slick code to compare them row by row, IDing
> > row numbers with mismatches?
> >
> > Thanks in advance.
> >
> > //
> >
> > ______________________________________________
> > R-help at r-project.org<mailto:R-help at r-project.org><mailto:
> R-help at r-project.org<mailto:R-help at r-project.org>> mailing list -- To
> UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> > posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org<mailto:R-help at r-project.org><mailto:
> R-help at r-project.org<mailto:R-help at r-project.org>> mailing list -- To
> UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To
> UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list