[R] add specific fields in for loop

Kai Yang y@ngk@|9999 @end|ng |rom y@hoo@com
Wed Nov 16 00:05:50 CET 2022


 Hi Avi,
Thank you spent time for my question. Your explanations is very clear and abundant. I use R for a shot time and still keep learning. So, my question may not very clear for your guys. sorry about that.
Thank you again,

Kai    On Tuesday, November 15, 2022 at 02:54:38 PM PST, avi.e.gross using gmail.com <avi.e.gross using gmail.com> wrote:  
 
 Kai,

 

I have read all the messages exchanged so far and what I have not yet seen is a clear explanation of what you want to do. I mean not as R code that may have mistakes, but as what your goal is.

 

Your code below was a gigantic set of nested if statements that is not trivial to parse. 

 

So help explain a bit or you may keep getting great solutions to problems you are not trying to solve.

 

You have a data.frame you called “df” that seems to currently have no relation to the rest of the code. You do seem to have a data.frame called “try2.un” instead so I assume you want an answer using that.

 

Your code seems to want to make a new column called “ab2” by using info currently held in columns “data1” through “data5” but you want a solution that is more general. First I want to see what your code does do and make sure that is what you want.

 

Your code starts like this (see below for the complete code):

 

  ifelse(grepl("ab2",try2.un$data1), try2.un$data1, # else clauses below

 

The above uses the logical version of grep, lgrep, and it seems that you are asking for all of the items in the column vector data1 to be searched for the unanchored presence of the string “ab2” and the first result is a vector of TRUE/FALSE. For those that are TRUE, meaning “ab2” was found, you want the actual result copied into the new column named “ab2” and for those marked as FALSE, continue with the next code line. I note you do not show any initialization for the new column to something like NA and depend on the final nested ifelse to set that as a default.

 

If what I wrote above is correct, then for any rows where data1 did not contain the specified text, you now search in data2:

    

        ifelse(grepl("ab2",try2.un$data2), try2.un$data2,

 

In this design, anything found in multiple places will only match the first place found. Anything not found anywhere ends up with an NA.

 

So in English, IFF the above is what you want, you want a search across all columns for the designated search string of “ab2” but only keep the first.

 

To make a loop I suggest something like this:

 

  try2.un$ab2 <- NA

 

Then choose what columns you want but do NOT choose “ab2”. If you want ALL other columns, then BEFORE the above line, save the current names as in:

 

  loop.cols <- names(try2.un)

 

If you only want a subset, use some code that narrows down what you want. You have not told us enough to make a suggestion. The point remains to have a variable (vector) that can be used in a loop that holds exactly the columns you want and in the right order. Unless I read you wrong, the order MATTERS as the first match wins and if the columns have different matches like “I am ab2” and “ab2 was my mother” you get the idea that you are keeping the exact text of the first match.

 

If my guess of your need was wrong, the rest is not going to make much sense.

 

So here is a loop:

 

  for (i in loop.cols) { print(i)}

 

I used “i” because you seem to like it. I prefer a more useful name. All the above does is print the names so you see if what you are doing makes sense.

 

Now rewrite that to do what you want and find a way to only update an NA value. You may want to think about what that means.

 

One idea is 

  try2.un$ab2 <-

    ifelse(is.na(try2.un$ab2) && grepl("ab2",try2.un[i]), 

          try2.un[i], 

          try2.un$ab2)

 

The above, which I have not tried, would be run in a loop and checks both whether an entry is still NA, and whether the current ith column has what you want. If both are true, it selects the value for those entries/rows from the column being looped on. If not, it retains the current non-NA setting from an earlier iteration of the loop.

 

You need to flesh this out for yourself as I am not supplying complete and tested code.

 

But note this is a very different meaning that some of us guessed and may still not be what you want. There are many such questions about doing something the same to each of the selected columns in a data.frame as in replacing all values of 999 with NA. In many such cases the order does not matter. Other such questions may want to check if any of the columns matches and simply return TRUE/FALSE in a new column or externally. Some of such requests are potentially simpler and easier. 

 

So you need to be very clear on what you want. I am going by what I think your sample code DOES and am not too sure it is exactly what you want.

 

 

From: Kai Yang <yangkai9999 using yahoo.com> 
Sent: Tuesday, November 15, 2022 1:53 PM
To: 'R-help Mailing List' <r-help using r-project.org>; avi.e.gross using gmail.com
Subject: Re: [R] add specific fields in for loop

 

Hello Bert and Avi,

Sorry, it is typo. it should be:

 

for (i in colnames(df)){
  ......
}

 

below is the code I'm currently using

 

try2.un$ab2 <-

 

  ifelse(grepl("ab2",try2.un$data1), try2.un$data1,

 

        ifelse(grepl("ab2",try2.un$data2), try2.un$data2,

 

                ifelse(grepl("ab2",try2.un$data3), try2.un$data3,

 

                      ifelse(grepl("ab2",try2.un$data4), try2.un$data4,

 

                              ifelse(grepl("ab2",try2.un$data5), try2.un$data5,NA

 

                              ) ) ) ) )

 

 

As you can see, it uses 5 fields (data1 -- 5 ) in ifelse function. I want to turn it to for loop, because the number of data(s) fields is dynamic. In this sample is 5, But it maybe more than 15 in some of situation. So, I want use loop to solve it and avoid to write those many ifelse statement. Also, in try2.un data frame, there are many other fields that I don't need to use in the loop. 

 

I'm not sure if the loop is a correct solution. But I'm willing to learn any more suggestion from you.

 

Thanks,

 

Kai

 

On Tuesday, November 15, 2022 at 09:23:03 AM PST, avi.e.gross using gmail.com <mailto:avi.e.gross using gmail.com>  <avi.e.gross using gmail.com <mailto:avi.e.gross using gmail.com> > wrote: 

 

 

Kai,

 

As Bert pointed out, it may not be clear what you want.

 

As a GUESS, you have some arbitrary data.frame object with multiple columns and you want to do something on selected columns. Consider changing your idea to be in several stages for simplicity and then optionally later rewriting it.

 

So step 1 is to get a vector of column names. The normal way to do this in base R is not with a function called columns(df) but colnames(df) ...

 

Step 2 is to use one of many techniques that take that vector of names and select the ones you want to keep. In base R there are many ways to do that including using regular expressions as in the "grep" family of functions. You may end up with a new vector of names perhaps shorter or in a different order.

 

Step 3 is to use those names in your loop. If you want say to convert a column from character to numeric, and your loop index is "current" you might write something like:

    df[current] <- as.numeric(df[current])

 

There are many ways and it depends on what exactly you want to do. There are packages designed to make some of these things fairly simple, such as dplyr where you can ask to match names that start or end a certain way or that are of certain types.

 

Avi

 

-----Original Message-----

From: R-help <r-help-bounces using r-project.org <mailto:r-help-bounces using r-project.org> > On Behalf Of Kai Yang via R-help

Sent: Tuesday, November 15, 2022 11:18 AM

To: R-help Mailing List <r-help using r-project.org <mailto:r-help using r-project.org> >

Subject: [R] add specific fields in for loop

 

Hi Team,

I can write a for loop like this:

for (i in columns(df)){

  ......

}

 

But it will working on all column in dataframe df. If I want to work on some of specific fields (say: the fields' name content 'date'), how should I modify the for loop? I changed the code below, but it doesn't work.

for (i in columns(df) %in% 'date' ){

  .....

}

 

 

Thank you,

Kai

 

    [[alternative HTML version deleted]]

 

______________________________________________

R-help using r-project.org <mailto:R-help using r-project.org>  mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



______________________________________________

R-help using r-project.org <mailto:R-help using r-project.org>  mailing list -- To UNSUBSCRIBE and more, see

https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


    [[alternative HTML version deleted]]

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
  
	[[alternative HTML version deleted]]



More information about the R-help mailing list