[R] Problem with filling dataframe's column

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Wed Jun 14 00:14:40 CEST 2023


Below.


On Tue, Jun 13, 2023 at 2:18 PM <avi.e.gross using gmail.com> wrote:
>
>
> Javad,
>
> There may be nothing wrong with the methods people are showing you and if
it satisfied you, great.
>
> But I note you have lots of data in over a quarter million rows. If much
of the text data is redundant, and you want to simplify some operations
such as changing some of the values to others I multiple ways, have you
done any learning about an R feature very useful for dealing with
categorical data called "factors"?
>
> If you have a vector or a column in a data.frame that contains text, then
it can be replaced by a factor that often takes way less space as it stores
a sort of dictionary of all the unique values and just records numbers like
1,2,3 to tell which one each item is.

-- This is false. It used to be true a **long time ago**, but R has for
quite a while used hashing/global string tables to avoid this problem. See
here
<https://stackoverflow.com/questions/50310092/why-does-r-use-factors-to-store-characters>
for details/references.
As a result, I think many would argue that working with strings *as
strings,* not factors, if often a better default, though of course there
are still situations where factors are useful (e.g. in ordering results by
factor levels where the desired level order is not alphabetical).

**I would appreciate correction/ clarification if my claims are wrong or
misleading! **

In any case, please do check such claims before making them on this list.

Cheers,
Bert


>
> You can access the values using levels(whatever) and also change them.
There are packages that make this straightforward such as forcats which is
one of the tidyverse packages that also includes many other tools some find
useful but are beyond the usual scope of this mailing list.
>
> As an example, if you have a vector in mydata$col1 then code like:
>
> mydata$col1 <- factor(mydata$col1)
>
> No matter which way you do it, you can now access the levels and make
whatever changes, and save the changes. One example could be to apply some
variant of grep to make the substitution. There is a family of functions
build in such as sub() that matches a Regular Expression and replaces it
with what you want.
>
> This has a similar result to changing all entries without doing all the
work. I mean if item 5 used to be "OLD" and is now "NEW" then any of you
quarter million entries that have a 5 will now be seen as having a value of
"NEW".
>
> I will stop here and suggest you may want to read some book that explains
R as a unified set of features with some emphasis on using it for the
features it is intended to have that can make life easier, rather than
using just features it shares with most languages. Some of your questions
indicate you have less grounding and are mainly following recipes you
stumble across.
>
> Otherwise, you will have a collection of what you call "codes" and others
like me call programming and that don't necessarily fit well together.
>
>
> -----Original Message-----
> From: R-help r-help-bounces using r-project.org <mailto:
r-help-bounces using r-project.org>  On Behalf Of javad bayat
> Sent: Tuesday, June 13, 2023 3:47 PM
> To: Eric Berger ericjberger using gmail.com <mailto:ericjberger using gmail.com>
> Cc: R-help using r-project.org <mailto:R-help using r-project.org>
> Subject: Re: [R] Problem with filling dataframe's column
>
> Dear all;
> I used these codes and I get what I wanted.
> Sincerely
>
> pat = c("Level 12","Level 22","0")
> data3 = data2[-which(data2$Layer == pat),]
> dim(data2)
> [1] 281549      9
> dim(data3)
> [1] 244075      9
>
> On Tue, Jun 13, 2023 at 11:36 AM Eric Berger < <mailto:
ericjberger using gmail.com> ericjberger using gmail.com> wrote:
>
> > Hi Javed,
> > grep returns the positions of the matches. See an example below.
> >
> > > v <- c("abc", "bcd", "def")
> > > v
> > [1] "abc" "bcd" "def"
> > > grep("cd",v)
> > [1] 2
> > > w <- v[-grep("cd",v)]
> > > w
> > [1] "abc" "def"
> > >
> >
> >
> > On Tue, Jun 13, 2023 at 8:50 AM javad bayat < <mailto:
j.bayat194 using gmail.com> j.bayat194 using gmail.com> wrote:
> > >
> > > Dear Rui;
> > > Hi. I used your codes, but it seems it didn't work for me.
> > >
> > > > pat <- c("_esmdes|_Des Section|0")
> > > > dim(data2)
> > >     [1]  281549      9
> > > > grep(pat, data2$Layer)
> > > > dim(data2)
> > >     [1]  281549      9
> > >
> > > What does grep function do? I expected the function to remove 3 rows
of
> > the
> > > dataframe.
> > > I do not know the reason.
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Jun 12, 2023 at 5:16 PM Rui Barradas < <mailto:
ruipbarradas using sapo.pt> ruipbarradas using sapo.pt>
> > wrote:
> > >
> > > > Às 23:13 de 12/06/2023, javad bayat escreveu:
> > > > > Dear Rui;
> > > > > Many thanks for the email. I tried your codes and found that the
> > length
> > > > of
> > > > > the "Values" and "Names" vectors must be equal, otherwise the
results
> > > > will
> > > > > not be useful.
> > > > > For some of the characters in the Layer column that I do not need
to
> > be
> > > > > filled in the LU column, I used "NA".
> > > > > But I need to delete some of the rows from the table as they are
> > useless
> > > > > for me. I tried this code to delete entire rows of the dataframe
> > which
> > > > > contained these three value in the Layer column: It gave me the
> > following
> > > > > error.
> > > > >
> > > > >> data3 = data2[-grep(c("_esmdes","_Des Section","0"),
data2$Layer),]
> > > > >       Warning message:
> > > > >        In grep(c("_esmdes", "_Des Section", "0"), data2$Layer) :
> > > > >        argument 'pattern' has length > 1 and only the first
element
> > will
> > > > be
> > > > > used
> > > > >
> > > > >> data3 = data2[!grepl(c("_esmdes","_Des Section","0"),
data2$Layer),]
> > > > >      Warning message:
> > > > >      In grepl(c("_esmdes", "_Des Section", "0"), data2$Layer) :
> > > > >      argument 'pattern' has length > 1 and only the first element
> > will be
> > > > > used
> > > > >
> > > > > How can I do this?
> > > > > Sincerely
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Sun, Jun 11, 2023 at 5:03 PM Rui Barradas < <mailto:
ruipbarradas using sapo.pt> ruipbarradas using sapo.pt>
> > > > wrote:
> > > > >
> > > > >> Às 13:18 de 11/06/2023, Rui Barradas escreveu:
> > > > >>> Às 22:54 de 11/06/2023, javad bayat escreveu:
> > > > >>>> Dear Rui;
> > > > >>>> Many thanks for your email. I used one of your codes,
> > > > >>>> "data2$LU[which(data2$Layer == "Level 12")] <- "Park"", and it
> > works
> > > > >>>> correctly for me.
> > > > >>>> Actually I need to expand the codes so as to consider all
> > "Levels" in
> > > > >> the
> > > > >>>> "Layer" column. There are more than hundred levels in the Layer
> > > > column.
> > > > >>>> If I use your provided code, I have to write it hundred of
time as
> > > > >> below:
> > > > >>>> data2$LU[which(data2$Layer == "Level 1")] <- "Park";
> > > > >>>> data2$LU[which(data2$Layer == "Level 2")] <- "Agri";
> > > > >>>> ...
> > > > >>>> ...
> > > > >>>> ...
> > > > >>>> .
> > > > >>>> Is there any other way to expand the code in order to consider
> > all of
> > > > >> the
> > > > >>>> levels simultaneously? Like the below code:
> > > > >>>> data2$LU[which(data2$Layer == c("Level 1","Level 2", "Level 3",
> > ...))]
> > > > >> <-
> > > > >>>> c("Park", "Agri", "GS", ...)
> > > > >>>>
> > > > >>>>
> > > > >>>> Sincerely
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> On Sun, Jun 11, 2023 at 1:43 PM Rui Barradas <
> >  <mailto:ruipbarradas using sapo.pt> ruipbarradas using sapo.pt>
> > > > >>>> wrote:
> > > > >>>>
> > > > >>>>> Às 21:05 de 11/06/2023, javad bayat escreveu:
> > > > >>>>>> Dear R users;
> > > > >>>>>> I am trying to fill a column based on a specific value in
> > another
> > > > >>>>>> column
> > > > >>>>> of
> > > > >>>>>> a dataframe, but it seems there is a problem with the codes!
> > > > >>>>>> The "Layer" and the "LU" are two different columns of the
> > dataframe.
> > > > >>>>>> How can I fix this?
> > > > >>>>>> Sincerely
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> for (i in 1:nrow(data2$Layer)){
> > > > >>>>>>              if (data2$Layer == "Level 12") {
> > > > >>>>>>                  data2$LU == "Park"
> > > > >>>>>>                  }
> > > > >>>>>>              }
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>> Hello,
> > > > >>>>>
> > > > >>>>> There are two bugs in your code,
> > > > >>>>>
> > > > >>>>> 1) the index i is not used in the loop
> > > > >>>>> 2) the assignment operator is `<-`, not `==`
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> Here is the loop corrected.
> > > > >>>>>
> > > > >>>>> for (i in 1:nrow(data2$Layer)){
> > > > >>>>>      if (data2$Layer[i] == "Level 12") {
> > > > >>>>>        data2$LU[i] <- "Park"
> > > > >>>>>      }
> > > > >>>>> }
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> But R is a vectorized language, the following two ways are the
> > > > idiomac
> > > > >>>>> ways of doing what you want to do.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> i <- data2$Layer == "Level 12"
> > > > >>>>> data2$LU[i] <- "Park"
> > > > >>>>>
> > > > >>>>> # equivalent one-liner
> > > > >>>>> data2$LU[data2$Layer == "Level 12"] <- "Park"
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> If there are NA's in data2$Layer it's probably safer to use
> > ?which()
> > > > in
> > > > >>>>> the logical index, to have a numeric one.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> i <- which(data2$Layer == "Level 12")
> > > > >>>>> data2$LU[i] <- "Park"
> > > > >>>>>
> > > > >>>>> # equivalent one-liner
> > > > >>>>> data2$LU[which(data2$Layer == "Level 12")] <- "Park"
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> Hope this helps,
> > > > >>>>>
> > > > >>>>> Rui Barradas
> > > > >>>>>
> > > > >>>>
> > > > >>>>
> > > > >>> Hello,
> > > > >>>
> > > > >>> You don't need to repeat the same instruction 100+ times, there
is
> > a
> > > > way
> > > > >>> of assigning all new LU values at the same time with match().
> > > > >>> This assumes that you have the new values in a vector.
> > > > >>
> > > > >> Sorry, this is not clear. I mean
> > > > >>
> > > > >>
> > > > >> This assumes that you have the new values in a vector, the vector
> > Names
> > > > >> below. The vector of values to be matched is created from the
data.
> > > > >>
> > > > >>
> > > > >> Rui Barradas
> > > > >>
> > > > >>>
> > > > >>>
> > > > >>> Values <- sort(unique(data2$Layer))
> > > > >>> Names <- c("Park", "Agri", "GS")
> > > > >>>
> > > > >>> i <- match(data2$Layer, Values)
> > > > >>> data2$LU <- Names[i]
> > > > >>>
> > > > >>>
> > > > >>> Hope this helps,
> > > > >>>
> > > > >>> Rui Barradas
> > > > >>>
> > > > >>> ______________________________________________
> > > > >>>  <mailto:R-help using r-project.org> R-help using r-project.org mailing
list -- To UNSUBSCRIBE and more, see
> > > > >>>  <https://stat.ethz.ch/mailman/listinfo/r-help>
https://stat.ethz.ch/mailman/listinfo/r-help
> > > > >>> PLEASE do read the posting guide
> > > > >>>  <http://www.R-project.org/posting-guide.html>
http://www.R-project.org/posting-guide.html
> > > > >>> and provide commented, minimal, self-contained, reproducible
code.
> > > > >>
> > > > >>
> > > > >
> > > > Hello,
> > > >
> > > > Please cc the r-help list, R-Help is threaded and this can in the
> > future
> > > > be helpful to others.
> > > >
> > > > You can combine several patters like this:
> > > >
> > > >
> > > > pat <- c("_esmdes|_Des Section|0")
> > > > grep(pat, data2$Layer)
> > > >
> > > > or, programatically,
> > > >
> > > >
> > > > pat <- paste(c("_esmdes","_Des Section","0"), collapse = "|")
> > > >
> > > >
> > > > Hope this helps,
> > > >
> > > > Rui Barradas
> > > >
> > > >
> > >
> > > --
> > > Best Regards
> > > Javad Bayat
> > > M.Sc. Environment Engineering
> > > Alternative Mail:  <mailto:bayat194 using yahoo.com> bayat194 using yahoo.com
> > >
> > >         [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > >  <mailto:R-help using r-project.org> R-help using r-project.org mailing list --
To UNSUBSCRIBE and more, see
> > >  <https://stat.ethz.ch/mailman/listinfo/r-help>
https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> >  <http://www.R-project.org/posting-guide.html>
http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
> --
> Best Regards
> Javad Bayat
> M.Sc. Environment Engineering
> Alternative Mail:  <mailto:bayat194 using yahoo.com> bayat194 using yahoo.com
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]



More information about the R-help mailing list