[R] Is there a hash data structure for R

Jan van der Laan rhe|p @end|ng |rom eoo@@dd@@n|
Wed Nov 3 10:47:06 CET 2021



On 03-11-2021 00:42, Avi Gross via R-help wrote:

> 
> Finally, someone mentioned how creating a data.frame with duplicate names
> for columns is not a problem as it can automagically CHANGE them to be
> unique. That is a HUGE problem for using that as a dictionary as the new
> name will not be known to the system so all kinds of things will fail.

I think you are referring to my remark which was:

 > However, the data.frame construction method will detect this and
 > generate unique names (which also might not be what you want):

I didn't say this means that duplicate names aren't a problem; I just 
mentioned the the behaviour is different. Personally, I would actually 
prefer the behaviour of list (keep the duplicated name) with a warning.

Most of the responses seem to assume that the OP actually wants a hash 
table. Yes, he did ask for that and for a hash table an environment 
(with some work) would be a good option. But in many cases, where other 
languages would use a hash-table-like object (such as a dict) in R you 
would use other types of objects. Furthermore, for many operations where 
you might use hash tables to implement the operation, R has already 
built in options, for example %in%, match, duplicated. These are also 
vectorised; so two vectors: one with keys and one with values might 
actually be faster than an environment in some use cases.

Best,
Jan


> 
> And there are also packages for many features like sets as well as functions
> to manipulate these things.
> 
> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of Bill Dunlap
> Sent: Tuesday, November 2, 2021 1:26 PM
> To: Andrew Simmons <akwsimmo using gmail.com>
> Cc: R Help <r-help using r-project.org>
> Subject: Re: [R] Is there a hash data structure for R
> 
> Note that an environment carries a hash table with it, while a named list
> does not.  I think that looking up an entry in a list causes a hash table to
> be created and thrown away.  Here are some timings involving setting and
> getting various numbers of entries in environments and lists.  The times are
> roughly linear in n for environments and quadratic for lists.
> 
>> vapply(1e3 * 2 ^ (0:6), f, L=new.env(parent=emptyenv()),
> FUN.VALUE=NA_real_)
> [1] 0.00 0.00 0.00 0.02 0.03 0.06 0.15
>> vapply(1e3 * 2 ^ (0:6), f, L=list(), FUN.VALUE=NA_real_)
> [1]  0.01  0.03  0.15  0.53  2.66 13.66 56.05
>> f
> function(n, L, V = sprintf("V%07d", sample(n, replace=TRUE))) {
>      system.time(for(v in V)L[[v]]<-c(L[[v]],v))["elapsed"] }
> 
> Note that environments do not allow an element named "" (the empty string).
> 
> Elements named NA_character_ are treated differently in environments and
> lists, neither of which is great.  You may want your hash table functions to
> deal with oddball names explicitly.
> 
> -Bill
> 
> On Tue, Nov 2, 2021 at 8:52 AM Andrew Simmons <akwsimmo using gmail.com> wrote:
> 
>> If you're thinking about using environments, I would suggest you
>> initialize them like
>>
>>
>> x <- new.env(parent = emptyenv())
>>
>>
>> Since environments have parent environments, it means that requesting
>> a value from that environment can actually return the value stored in
>> a parent environment (this isn't an issue for [[ or $, this is
>> exclusively an issue with assign, get, and exists) Or, if you've
>> already got your values stored in a list that you want to turn into an
>> environment:
>>
>>
>> x <- list2env(listOfValues, parent = emptyenv())
>>
>>
>> Hope this helps!
>>
>>
>> On Tue, Nov 2, 2021, 06:49 Yonghua Peng <yong using pobox.com> wrote:
>>
>>> But for data.frame the colnames can be duplicated. Am I right?
>>>
>>> Regards.
>>>
>>> On Tue, Nov 2, 2021 at 6:29 PM Jan van der Laan <rhelp using eoos.dds.nl>
>> wrote:
>>>
>>>>
>>>> True, but in a lot of cases where a python user might use a dict
>>>> an R user will probably use a list; or when we are talking about
>>>> arrays of dicts in python, the R solution will probably be a
>>>> data.frame (with
>> each
>>>> dict field in a separate column).
>>>>
>>>> Jan
>>>>
>>>>
>>>>
>>>>
>>>> On 02-11-2021 11:18, Eric Berger wrote:
>>>>> One choice is
>>>>> new.env(hash=TRUE)
>>>>> in the base package
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Nov 2, 2021 at 11:48 AM Yonghua Peng <yong using pobox.com> wrote:
>>>>>
>>>>>> I know this is a newbie question. But how do I implement the
>>>>>> hash
>>>> structure
>>>>>> which is available in other languages (in python it's dict)?
>>>>>>
>>>>>> I know there is the list, but list's names can be duplicated here.
>>>>>>
>>>>>>> x <- list(x=1:5,y=month.name,x=3:7)
>>>>>>
>>>>>>> x
>>>>>>
>>>>>> $x
>>>>>>
>>>>>> [1] 1 2 3 4 5
>>>>>>
>>>>>>
>>>>>> $y
>>>>>>
>>>>>>    [1] "January"   "February"  "March"     "April"     "May"
>>>   "June"
>>>>>>
>>>>>>    [7] "July"      "August"    "September" "October"   "November"
>>>> "December"
>>>>>>
>>>>>>
>>>>>> $x
>>>>>>
>>>>>> [1] 3 4 5 6 7
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks a lot.
>>>>>>
>>>>>>           [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more,
>>>>>> see https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>
>>>>>        [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more,
>>>>> see https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>> ______________________________________________
>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>          [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>          [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list