[R] Problem with filling dataframe's column

Thu Jun 15 07:51:47 CEST 2023

Richard, it is indeed possible for different languages to choose different approaches.

If your point is that an R  named list can simulate a Python dictionary (or for that manner, a set) there is some validity to that. You can also use environments similarly.

Arguably there are differences including in things like what notations are built into the language. If you look the other way, Python chose to make lists a major feature which can hold any combination of things and can even be used to emulate a matrix with sub-lists and also had a tuple version that is similar but immutable and initially neglected something as simple as a vector containing just one kind of content. If you look at it now, many people simply load numpy (and often pandas) to get functionality that is faster and comes by default in R.

I think this discussion was about my (amended) offhand remark suggesting R factors stored plain text in a vector attached to the variable and the offset was the number stored in the main factor vector. If that changed to internally use something hashed like a dictionary, fine. I have often made data structures such as in your example to store named items but did not call it a dictionary but simply a named list. In one sense, the two map into each other but I could argue there remain differences. For example, you can use something immutable like a tuple as a key in python. 

This is not an argument about which language is better. Each has developed to fill ideas and has been extended and quite a few things can now be done in either one. Still, it can be interesting to combine the two inside RSTUDIO so each does some of what it may do better or faster or in a way you find more natural.

From: Richard O'Keefe <raoknz using gmail.com> 
Sent: Wednesday, June 14, 2023 10:34 PM
To: avi.e.gross using gmail.com
Cc: Bert Gunter <bgunter.4567 using gmail.com>; R-help using r-project.org
Subject: Re: [R] Problem with filling dataframe's column

Consider

  m <- list(foo=c(1,2),"B'ar"=as.matrix(1:4,2,2),"!*#"=c(FALSE,TRUE))

It is a collection of elements of different types/structures, accessible
via string keys (and also by position).  Entries can be added:

  m[["fred"]] <- 47

Entries can be removed:

  m[["!*#"]] <- NULL

How much more like a Python dictionary do you need it to be?

On Wed, 14 Jun 2023 at 11:25, <avi.e.gross using gmail.com <mailto:avi.e.gross using gmail.com> > wrote:
Bert,

I stand corrected. What I said may have once been true but apparently the implementation seems to have changed at some level.

I did not factor that in.

Nevertheless, whether you use an index as a key or as an offset into an attached vector of labels, it seems to work the same and I think my comment applies well enough that changing a few labels instead of scanning lots of entries can sometimes be a good think. As far as I can tell, external interface seem the same for now. 

One issue with R for a long time was how they did not do something more like a Python dictionary and it looks like …

ABOVE

From: Bert Gunter <bgunter.4567 using gmail.com <mailto:bgunter.4567 using gmail.com> > 
Sent: Tuesday, June 13, 2023 6:15 PM
To: avi.e.gross using gmail.com <mailto:avi.e.gross using gmail.com> 
Cc: javad bayat <j.bayat194 using gmail.com <mailto:j.bayat194 using gmail.com> >; R-help using r-project.org <mailto:R-help using r-project.org> 
Subject: Re: [R] Problem with filling dataframe's column

Below.

On Tue, Jun 13, 2023 at 2:18 PM <avi.e.gross using gmail.com <mailto:avi.e.gross using gmail.com>  <mailto:avi.e.gross using gmail.com <mailto:avi.e.gross using gmail.com> > > wrote:
>
>  
> Javad,
>
> There may be nothing wrong with the methods people are showing you and if it satisfied you, great.
>
> But I note you have lots of data in over a quarter million rows. If much of the text data is redundant, and you want to simplify some operations such as changing some of the values to others I multiple ways, have you done any learning about an R feature very useful for dealing with categorical data called "factors"?
>
> If you have a vector or a column in a data.frame that contains text, then it can be replaced by a factor that often takes way less space as it stores a sort of dictionary of all the unique values and just records numbers like 1,2,3 to tell which one each item is.

-- This is false. It used to be true a **long time ago**, but R has for quite a while used hashing/global string tables to avoid this problem. See here <https://stackoverflow.com/questions/50310092/why-does-r-use-factors-to-store-characters>  for details/references.
As a result, I think many would argue that working with strings *as strings,* not factors, if often a better default, though of course there are still situations where factors are useful (e.g. in ordering results by factor levels where the desired level order is not alphabetical).

**I would appreciate correction/ clarification if my claims are wrong or misleading! **

In any case, please do check such claims before making them on this list.

Cheers,
Bert

        [[alternative HTML version deleted]]

______________________________________________
R-help using r-project.org <mailto:R-help using r-project.org>  mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]