[R] Problem with filling dataframe's column

@vi@e@gross m@iii@g oii gm@ii@com @vi@e@gross m@iii@g oii gm@ii@com
Wed Jun 14 01:24:45 CEST 2023


Bert,
 
I stand corrected. What I said may have once been true but apparently the implementation seems to have changed at some level.
 
I did not factor that in.
 
Nevertheless, whether you use an index as a key or as an offset into an attached vector of labels, it seems to work the same and I think my comment applies well enough that changing a few labels instead of scanning lots of entries can sometimes be a good think. As far as I can tell, external interface seem the same for now. 
 
One issue with R for a long time was how they did not do something more like a Python dictionary and it looks like …
 
ABOVE
 
From: Bert Gunter <bgunter.4567 using gmail.com> 
Sent: Tuesday, June 13, 2023 6:15 PM
To: avi.e.gross using gmail.com
Cc: javad bayat <j.bayat194 using gmail.com>; R-help using r-project.org
Subject: Re: [R] Problem with filling dataframe's column
 
Below.


On Tue, Jun 13, 2023 at 2:18 PM <avi.e.gross using gmail.com <mailto:avi.e.gross using gmail.com> > wrote:
>
>  
> Javad,
>
> There may be nothing wrong with the methods people are showing you and if it satisfied you, great.
>
> But I note you have lots of data in over a quarter million rows. If much of the text data is redundant, and you want to simplify some operations such as changing some of the values to others I multiple ways, have you done any learning about an R feature very useful for dealing with categorical data called "factors"?
>
> If you have a vector or a column in a data.frame that contains text, then it can be replaced by a factor that often takes way less space as it stores a sort of dictionary of all the unique values and just records numbers like 1,2,3 to tell which one each item is.
 
-- This is false. It used to be true a **long time ago**, but R has for quite a while used hashing/global string tables to avoid this problem. See here <https://stackoverflow.com/questions/50310092/why-does-r-use-factors-to-store-characters>  for details/references.
As a result, I think many would argue that working with strings *as strings,* not factors, if often a better default, though of course there are still situations where factors are useful (e.g. in ordering results by factor levels where the desired level order is not alphabetical).
 
**I would appreciate correction/ clarification if my claims are wrong or misleading! **
 
In any case, please do check such claims before making them on this list.
 
Cheers,
Bert
 
 

	[[alternative HTML version deleted]]



More information about the R-help mailing list