[R] counting duplicate items that occur in multiple groups

Deepayan Sarkar deep@y@n@@@rk@r @end|ng |rom gm@||@com
Wed Nov 18 10:40:23 CET 2020


On Wed, Nov 18, 2020 at 5:40 AM Bert Gunter <bgunter.4567 using gmail.com> wrote:
>
> z <- with(Data2, tapply(Vendor,Account, I))
> n <- vapply(z,length,1)
> data.frame (Vendor = unlist(z),
>    Account = rep(names(z),n),
>    NumVen = rep(n,n)
> )
>
> ## which gives:
>
>    Vendor Account NumVen
> A1      V1      A1      1
> A21     V2      A2      3
> A22     V3      A2      3
> A23     V1      A2      3
> A3      V4      A3      1
> A4      V2      A4      1
>
> Of course this also works for Data1
>
> Bill may be able to come up with a slicker version, however.

Perhaps

transform(Data2, nshare = as.vector(table(Account)[Account]))

(or dplyr::mutate() instead of transform(), if you prefer.)

-Deepayan

>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Tue, Nov 17, 2020 at 3:34 PM Tom Woolman <twoolman using ontargettek.com>
> wrote:
>
> > Yes, good catch. Thanks
> >
> >
> > Quoting Bert Gunter <bgunter.4567 using gmail.com>:
> >
> > > Why 0's in the data frame? Shouldn't that be 1 (vendor with that
> > account)?
> > >
> > > Bert
> > > Bert Gunter
> > >
> > > "The trouble with having an open mind is that people keep coming along
> > and
> > > sticking things into it."
> > > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> > >
> > >
> > > On Tue, Nov 17, 2020 at 3:29 PM Tom Woolman <twoolman using ontargettek.com>
> > > wrote:
> > >
> > >> Hi Bill. Sorry to be so obtuse with the example data, I was trying
> > >> (too hard) not to share any actual values so I just created randomized
> > >> values for my example; of course I should have specified that the
> > >> random values would not provide the expected problem pattern. I should
> > >> have just used simple dummy codes as Bill Dunlap did.
> > >>
> > >> So per Bill's example data for Data1, the expected (hoped for) output
> > >> should be:
> > >>
> > >>   Vendor Account Num_Vendors_Sharing_Bank_Acct
> > >> 1     V1      A1      0
> > >> 2     V2      A2      3
> > >> 3     V3      A2      3
> > >> 4     V4      A2      3
> > >>
> > >>
> > >> Where the new calculated variable is Num_Vendors_Sharing_Bank_Acct.
> > >> The value is 3 for V2, V3 and V4 because they each share bank account
> > >> A2.
> > >>
> > >>
> > >> Likewise, in the Data2 frame, the same logic applies:
> > >>
> > >>   Vendor Account Num_Vendors_Sharing_Bank_Acct
> > >> 1     V1      A1     0
> > >> 2     V2      A2     3
> > >> 3     V3      A2     3
> > >> 4     V1      A2     3
> > >> 5     V4      A3     0
> > >> 6     V2      A4     0
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> Thanks!
> > >>
> > >>
> > >> Quoting Bill Dunlap <williamwdunlap using gmail.com>:
> > >>
> > >> > What should the result be for
> > >> >   Data1 <- data.frame(Vendor=c("V1","V2","V3","V4"),
> > >> > Account=c("A1","A2","A2","A2"))
> > >> > ?
> > >> >
> > >> > Must each vendor have only one account?  If not, what should the
> > result
> > >> be
> > >> > for
> > >> >    Data2 <- data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"),
> > >> > Account=c("A1","A2","A2","A2","A3","A4"))
> > >> > ?
> > >> >
> > >> > -Bill
> > >> >
> > >> > On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman <twoolman using ontargettek.com
> > >
> > >> > wrote:
> > >> >
> > >> >> Hi everyone.  I have a dataframe that is a collection of Vendor IDs
> > >> >> plus a bank account number for each vendor. I'm trying to find a way
> > >> >> to count the number of duplicate bank accounts that occur in more
> > than
> > >> >> one unique Vendor_ID, and then assign the count value for each row in
> > >> >> the dataframe in a new variable.
> > >> >>
> > >> >> I can do a count of bank accounts that occur within the same vendor
> > >> >> using dplyr and group_by and count, but I can't figure out a way to
> > >> >> count duplicates among multiple Vendor_IDs.
> > >> >>
> > >> >>
> > >> >> Dataframe example code:
> > >> >>
> > >> >>
> > >> >> #Create a sample data frame:
> > >> >>
> > >> >> set.seed(1)
> > >> >>
> > >> >> Data <- data.frame(Vendor_ID = sample(1:10000), Bank_Account_ID =
> > >> >> sample(1:10000))
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >> Thanks in advance for any help.
> > >> >>
> > >> >> ______________________________________________
> > >> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >> >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> >> PLEASE do read the posting guide
> > >> >> http://www.R-project.org/posting-guide.html
> > >> >> and provide commented, minimal, self-contained, reproducible code.
> > >> >>
> > >>
> > >> ______________________________________________
> > >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide
> > >> http://www.R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible code.
> > >>
> >
> >
> >
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list