[R] Problem with filling dataframe's column

@vi@e@gross m@iii@g oii gm@ii@com @vi@e@gross m@iii@g oii gm@ii@com
Tue Jun 13 23:18:41 CEST 2023


 
Javad,
 
There may be nothing wrong with the methods people are showing you and if it satisfied you, great.
 
But I note you have lots of data in over a quarter million rows. If much of the text data is redundant, and you want to simplify some operations such as changing some of the values to others I multiple ways, have you done any learning about an R feature very useful for dealing with categorical data called "factors"?
 
If you have a vector or a column in a data.frame that contains text, then it can be replaced by a factor that often takes way less space as it stores a sort of dictionary of all the unique values and just records numbers like 1,2,3 to tell which one each item is. 
 
You can access the values using levels(whatever) and also change them. There are packages that make this straightforward such as forcats which is one of the tidyverse packages that also includes many other tools some find useful but are beyond the usual scope of this mailing list.
 
As an example, if you have a vector in mydata$col1 then code like:
 
mydata$col1 <- factor(mydata$col1)
 
No matter which way you do it, you can now access the levels and make whatever changes, and save the changes. One example could be to apply some variant of grep to make the substitution. There is a family of functions build in such as sub() that matches a Regular Expression and replaces it with what you want.
 
This has a similar result to changing all entries without doing all the work. I mean if item 5 used to be "OLD" and is now "NEW" then any of you quarter million entries that have a 5 will now be seen as having a value of "NEW".
 
I will stop here and suggest you may want to read some book that explains R as a unified set of features with some emphasis on using it for the features it is intended to have that can make life easier, rather than using just features it shares with most languages. Some of your questions indicate you have less grounding and are mainly following recipes you stumble across. 
 
Otherwise, you will have a collection of what you call "codes" and others like me call programming and that don't necessarily fit well together.
 
 
-----Original Message-----
From: R-help r-help-bounces using r-project.org <mailto:r-help-bounces using r-project.org>  On Behalf Of javad bayat
Sent: Tuesday, June 13, 2023 3:47 PM
To: Eric Berger ericjberger using gmail.com <mailto:ericjberger using gmail.com> 
Cc: R-help using r-project.org <mailto:R-help using r-project.org> 
Subject: Re: [R] Problem with filling dataframe's column
 
Dear all;
I used these codes and I get what I wanted.
Sincerely
 
pat = c("Level 12","Level 22","0")
data3 = data2[-which(data2$Layer == pat),]
dim(data2)
[1] 281549      9
dim(data3)
[1] 244075      9
 
On Tue, Jun 13, 2023 at 11:36 AM Eric Berger < <mailto:ericjberger using gmail.com> ericjberger using gmail.com> wrote:
 
> Hi Javed,
> grep returns the positions of the matches. See an example below.
> 
> > v <- c("abc", "bcd", "def")
> > v
> [1] "abc" "bcd" "def"
> > grep("cd",v)
> [1] 2
> > w <- v[-grep("cd",v)]
> > w
> [1] "abc" "def"
> >
> 
> 
> On Tue, Jun 13, 2023 at 8:50 AM javad bayat < <mailto:j.bayat194 using gmail.com> j.bayat194 using gmail.com> wrote:
> >
> > Dear Rui;
> > Hi. I used your codes, but it seems it didn't work for me.
> >
> > > pat <- c("_esmdes|_Des Section|0")
> > > dim(data2)
> >     [1]  281549      9
> > > grep(pat, data2$Layer)
> > > dim(data2)
> >     [1]  281549      9
> >
> > What does grep function do? I expected the function to remove 3 rows of
> the
> > dataframe.
> > I do not know the reason.
> >
> >
> >
> >
> >
> >
> > On Mon, Jun 12, 2023 at 5:16 PM Rui Barradas < <mailto:ruipbarradas using sapo.pt> ruipbarradas using sapo.pt>
> wrote:
> >
> > > Às 23:13 de 12/06/2023, javad bayat escreveu:
> > > > Dear Rui;
> > > > Many thanks for the email. I tried your codes and found that the
> length
> > > of
> > > > the "Values" and "Names" vectors must be equal, otherwise the results
> > > will
> > > > not be useful.
> > > > For some of the characters in the Layer column that I do not need to
> be
> > > > filled in the LU column, I used "NA".
> > > > But I need to delete some of the rows from the table as they are
> useless
> > > > for me. I tried this code to delete entire rows of the dataframe
> which
> > > > contained these three value in the Layer column: It gave me the
> following
> > > > error.
> > > >
> > > >> data3 = data2[-grep(c("_esmdes","_Des Section","0"), data2$Layer),]
> > > >       Warning message:
> > > >        In grep(c("_esmdes", "_Des Section", "0"), data2$Layer) :
> > > >        argument 'pattern' has length > 1 and only the first element
> will
> > > be
> > > > used
> > > >
> > > >> data3 = data2[!grepl(c("_esmdes","_Des Section","0"), data2$Layer),]
> > > >      Warning message:
> > > >      In grepl(c("_esmdes", "_Des Section", "0"), data2$Layer) :
> > > >      argument 'pattern' has length > 1 and only the first element
> will be
> > > > used
> > > >
> > > > How can I do this?
> > > > Sincerely
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Sun, Jun 11, 2023 at 5:03 PM Rui Barradas < <mailto:ruipbarradas using sapo.pt> ruipbarradas using sapo.pt>
> > > wrote:
> > > >
> > > >> Às 13:18 de 11/06/2023, Rui Barradas escreveu:
> > > >>> Às 22:54 de 11/06/2023, javad bayat escreveu:
> > > >>>> Dear Rui;
> > > >>>> Many thanks for your email. I used one of your codes,
> > > >>>> "data2$LU[which(data2$Layer == "Level 12")] <- "Park"", and it
> works
> > > >>>> correctly for me.
> > > >>>> Actually I need to expand the codes so as to consider all
> "Levels" in
> > > >> the
> > > >>>> "Layer" column. There are more than hundred levels in the Layer
> > > column.
> > > >>>> If I use your provided code, I have to write it hundred of time as
> > > >> below:
> > > >>>> data2$LU[which(data2$Layer == "Level 1")] <- "Park";
> > > >>>> data2$LU[which(data2$Layer == "Level 2")] <- "Agri";
> > > >>>> ...
> > > >>>> ...
> > > >>>> ...
> > > >>>> .
> > > >>>> Is there any other way to expand the code in order to consider
> all of
> > > >> the
> > > >>>> levels simultaneously? Like the below code:
> > > >>>> data2$LU[which(data2$Layer == c("Level 1","Level 2", "Level 3",
> ...))]
> > > >> <-
> > > >>>> c("Park", "Agri", "GS", ...)
> > > >>>>
> > > >>>>
> > > >>>> Sincerely
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> On Sun, Jun 11, 2023 at 1:43 PM Rui Barradas <
>  <mailto:ruipbarradas using sapo.pt> ruipbarradas using sapo.pt>
> > > >>>> wrote:
> > > >>>>
> > > >>>>> Às 21:05 de 11/06/2023, javad bayat escreveu:
> > > >>>>>> Dear R users;
> > > >>>>>> I am trying to fill a column based on a specific value in
> another
> > > >>>>>> column
> > > >>>>> of
> > > >>>>>> a dataframe, but it seems there is a problem with the codes!
> > > >>>>>> The "Layer" and the "LU" are two different columns of the
> dataframe.
> > > >>>>>> How can I fix this?
> > > >>>>>> Sincerely
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> for (i in 1:nrow(data2$Layer)){
> > > >>>>>>              if (data2$Layer == "Level 12") {
> > > >>>>>>                  data2$LU == "Park"
> > > >>>>>>                  }
> > > >>>>>>              }
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>> Hello,
> > > >>>>>
> > > >>>>> There are two bugs in your code,
> > > >>>>>
> > > >>>>> 1) the index i is not used in the loop
> > > >>>>> 2) the assignment operator is `<-`, not `==`
> > > >>>>>
> > > >>>>>
> > > >>>>> Here is the loop corrected.
> > > >>>>>
> > > >>>>> for (i in 1:nrow(data2$Layer)){
> > > >>>>>      if (data2$Layer[i] == "Level 12") {
> > > >>>>>        data2$LU[i] <- "Park"
> > > >>>>>      }
> > > >>>>> }
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> But R is a vectorized language, the following two ways are the
> > > idiomac
> > > >>>>> ways of doing what you want to do.
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> i <- data2$Layer == "Level 12"
> > > >>>>> data2$LU[i] <- "Park"
> > > >>>>>
> > > >>>>> # equivalent one-liner
> > > >>>>> data2$LU[data2$Layer == "Level 12"] <- "Park"
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> If there are NA's in data2$Layer it's probably safer to use
> ?which()
> > > in
> > > >>>>> the logical index, to have a numeric one.
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> i <- which(data2$Layer == "Level 12")
> > > >>>>> data2$LU[i] <- "Park"
> > > >>>>>
> > > >>>>> # equivalent one-liner
> > > >>>>> data2$LU[which(data2$Layer == "Level 12")] <- "Park"
> > > >>>>>
> > > >>>>>
> > > >>>>> Hope this helps,
> > > >>>>>
> > > >>>>> Rui Barradas
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>> Hello,
> > > >>>
> > > >>> You don't need to repeat the same instruction 100+ times, there is
> a
> > > way
> > > >>> of assigning all new LU values at the same time with match().
> > > >>> This assumes that you have the new values in a vector.
> > > >>
> > > >> Sorry, this is not clear. I mean
> > > >>
> > > >>
> > > >> This assumes that you have the new values in a vector, the vector
> Names
> > > >> below. The vector of values to be matched is created from the data.
> > > >>
> > > >>
> > > >> Rui Barradas
> > > >>
> > > >>>
> > > >>>
> > > >>> Values <- sort(unique(data2$Layer))
> > > >>> Names <- c("Park", "Agri", "GS")
> > > >>>
> > > >>> i <- match(data2$Layer, Values)
> > > >>> data2$LU <- Names[i]
> > > >>>
> > > >>>
> > > >>> Hope this helps,
> > > >>>
> > > >>> Rui Barradas
> > > >>>
> > > >>> ______________________________________________
> > > >>>  <mailto:R-help using r-project.org> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > >>>  <https://stat.ethz.ch/mailman/listinfo/r-help> https://stat.ethz.ch/mailman/listinfo/r-help
> > > >>> PLEASE do read the posting guide
> > > >>>  <http://www.R-project.org/posting-guide.html> http://www.R-project.org/posting-guide.html
> > > >>> and provide commented, minimal, self-contained, reproducible code.
> > > >>
> > > >>
> > > >
> > > Hello,
> > >
> > > Please cc the r-help list, R-Help is threaded and this can in the
> future
> > > be helpful to others.
> > >
> > > You can combine several patters like this:
> > >
> > >
> > > pat <- c("_esmdes|_Des Section|0")
> > > grep(pat, data2$Layer)
> > >
> > > or, programatically,
> > >
> > >
> > > pat <- paste(c("_esmdes","_Des Section","0"), collapse = "|")
> > >
> > >
> > > Hope this helps,
> > >
> > > Rui Barradas
> > >
> > >
> >
> > --
> > Best Regards
> > Javad Bayat
> > M.Sc. Environment Engineering
> > Alternative Mail:  <mailto:bayat194 using yahoo.com> bayat194 using yahoo.com
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> >  <mailto:R-help using r-project.org> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >  <https://stat.ethz.ch/mailman/listinfo/r-help> https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
>  <http://www.R-project.org/posting-guide.html> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
 
 
-- 
Best Regards
Javad Bayat
M.Sc. Environment Engineering
Alternative Mail:  <mailto:bayat194 using yahoo.com> bayat194 using yahoo.com
 

	[[alternative HTML version deleted]]



More information about the R-help mailing list