[R] Is there a hash data structure for R

Avi Gross @v|gro@@ @end|ng |rom ver|zon@net
Wed Nov 3 00:42:03 CET 2021


I have several things I considered about this topic.

It is, in general, not possible to do some things in one language or another
even if you find a bridge. Python lets you place all kinds of things into a
dictionary including many static objects like tuples or even other
dictionaries. What is allowed for keys is quite broad. If you use an R
environment or list, there are restrictions on what names are allowed that
are not necessarily the same. 

Now on to key uniqueness. Yes, R allows multiple named entries to share the
same name. But  when I made such a structure, the FIRST instance hides any
later ones when accessing something like list.name$A or setting it. If you
remove an exiting entry by something like setting the above to NULL, though,
the second instance of that name nor becomes the first. So anyone wanting to
fully remove all instances might need to loop till sure all are gone. Not
sure about environments but they may behave better.

Third, multiple parties have built R packages to support hashing including
some no longer available:

https://cran.r-project.org/web/packages/hash/hash.pdf
https://www.rdocumentation.org/packages/Dict/versions/0.10.0

So  if one of those works, why reinvent it?

In any case, if you roll your own, as has been shown by others, you may have
to provide getter and setter functionality and so on to make sure of things
that you want for compatibility and be Careful as any other programs that
can play with your data may go around you.

Finally, someone mentioned how creating a data.frame with duplicate names
for columns is not a problem as it can automagically CHANGE them to be
unique. That is a HUGE problem for using that as a dictionary as the new
name will not be known to the system so all kinds of things will fail.

And there are also packages for many features like sets as well as functions
to manipulate these things.

-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Bill Dunlap
Sent: Tuesday, November 2, 2021 1:26 PM
To: Andrew Simmons <akwsimmo using gmail.com>
Cc: R Help <r-help using r-project.org>
Subject: Re: [R] Is there a hash data structure for R

Note that an environment carries a hash table with it, while a named list
does not.  I think that looking up an entry in a list causes a hash table to
be created and thrown away.  Here are some timings involving setting and
getting various numbers of entries in environments and lists.  The times are
roughly linear in n for environments and quadratic for lists.

> vapply(1e3 * 2 ^ (0:6), f, L=new.env(parent=emptyenv()),
FUN.VALUE=NA_real_)
[1] 0.00 0.00 0.00 0.02 0.03 0.06 0.15
> vapply(1e3 * 2 ^ (0:6), f, L=list(), FUN.VALUE=NA_real_)
[1]  0.01  0.03  0.15  0.53  2.66 13.66 56.05
> f
function(n, L, V = sprintf("V%07d", sample(n, replace=TRUE))) {
    system.time(for(v in V)L[[v]]<-c(L[[v]],v))["elapsed"] }

Note that environments do not allow an element named "" (the empty string).

Elements named NA_character_ are treated differently in environments and
lists, neither of which is great.  You may want your hash table functions to
deal with oddball names explicitly.

-Bill

On Tue, Nov 2, 2021 at 8:52 AM Andrew Simmons <akwsimmo using gmail.com> wrote:

> If you're thinking about using environments, I would suggest you 
> initialize them like
>
>
> x <- new.env(parent = emptyenv())
>
>
> Since environments have parent environments, it means that requesting 
> a value from that environment can actually return the value stored in 
> a parent environment (this isn't an issue for [[ or $, this is 
> exclusively an issue with assign, get, and exists) Or, if you've 
> already got your values stored in a list that you want to turn into an 
> environment:
>
>
> x <- list2env(listOfValues, parent = emptyenv())
>
>
> Hope this helps!
>
>
> On Tue, Nov 2, 2021, 06:49 Yonghua Peng <yong using pobox.com> wrote:
>
> > But for data.frame the colnames can be duplicated. Am I right?
> >
> > Regards.
> >
> > On Tue, Nov 2, 2021 at 6:29 PM Jan van der Laan <rhelp using eoos.dds.nl>
> wrote:
> >
> > >
> > > True, but in a lot of cases where a python user might use a dict 
> > > an R user will probably use a list; or when we are talking about 
> > > arrays of dicts in python, the R solution will probably be a 
> > > data.frame (with
> each
> > > dict field in a separate column).
> > >
> > > Jan
> > >
> > >
> > >
> > >
> > > On 02-11-2021 11:18, Eric Berger wrote:
> > > > One choice is
> > > > new.env(hash=TRUE)
> > > > in the base package
> > > >
> > > >
> > > >
> > > > On Tue, Nov 2, 2021 at 11:48 AM Yonghua Peng <yong using pobox.com> wrote:
> > > >
> > > >> I know this is a newbie question. But how do I implement the 
> > > >> hash
> > > structure
> > > >> which is available in other languages (in python it's dict)?
> > > >>
> > > >> I know there is the list, but list's names can be duplicated here.
> > > >>
> > > >>> x <- list(x=1:5,y=month.name,x=3:7)
> > > >>
> > > >>> x
> > > >>
> > > >> $x
> > > >>
> > > >> [1] 1 2 3 4 5
> > > >>
> > > >>
> > > >> $y
> > > >>
> > > >>   [1] "January"   "February"  "March"     "April"     "May"
> >  "June"
> > > >>
> > > >>   [7] "July"      "August"    "September" "October"   "November"
> > > "December"
> > > >>
> > > >>
> > > >> $x
> > > >>
> > > >> [1] 3 4 5 6 7
> > > >>
> > > >>
> > > >>
> > > >> Thanks a lot.
> > > >>
> > > >>          [[alternative HTML version deleted]]
> > > >>
> > > >> ______________________________________________
> > > >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, 
> > > >> see https://stat.ethz.ch/mailman/listinfo/r-help
> > > >> PLEASE do read the posting guide 
> > > >> http://www.R-project.org/posting-guide.html
> > > >> and provide commented, minimal, self-contained, reproducible code.
> > > >>
> > > >
> > > >       [[alternative HTML version deleted]]
> > > >
> > > > ______________________________________________
> > > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, 
> > > > see https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.
> > > >
> > >
> > > ______________________________________________
> > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list