[R] Removing variables from data frame with a wile card

@vi@e@gross m@iii@g oii gm@ii@com @vi@e@gross m@iii@g oii gm@ii@com
Sun Jan 15 20:39:58 CET 2023


John,

As you said, you are new to the discussion so let me catch you up.

The original question was about removing many columns that shared a similar feature in the naming convention while leaving other columns in-place. Quite a few replies were given on how to do that including how to use a regular expression to gather the column names you want to remove.

It was only afterwards that the topic changed a bit to mention that some people used additional ways both in base R and also using packages like dplyr in the tidyverse.

As a general rule, most packages out there provide functionality that can be done in base R if you wish, and some are written purely in R while some augment that with parts re-done in C or something. If a package is well built and frequently used, it may well make your life as a programmer easier as the code need not be re-invented and debugged. Of course some packages are of poorer quality.

So we fully agree that unless asked for, the base R answers should be the focus HERE. Then again, languages are not static and sometimes we see things like pipes moved in a modified version into the main language.

Avi

-----Original Message-----
From: Sorkin, John <jsorkin using som.umaryland.edu> 
Sent: Sunday, January 15, 2023 11:55 AM
To: Valentin Petzel <valentin using petzel.at>; avi.e.gross using gmail.com
Cc: 'R-help Mailing List' <r-help using r-project.org>
Subject: Re: [R] Removing variables from data frame with a wile card

I am new to this thread. At the risk of presenting something that has been shown before, below I demonstrate how a column in a data frame can be dropped using a wild card, i.e. a column whose name starts with "th" using nothing more than base r functions and base R syntax. While additions to R such as tidyverse can be very helpful, many things that they do can be accomplished simply using base R.  

# Create data frame with three columns
one <- rep(1,10)
one
two <- rep(2,10)
two
three <- rep(3,10)
three
mydata <- data.frame(one=one, two=two, three=three) cat("Data frame with three columns\n") mydata

# Drop the column whose name starts with th, i.e. column three # Find the location of the column ColumToDelete <- grep("th",colnames((mydata))) cat("The colomumn to be dropped is the column called three, which is column",ColumToDelete,"\n") ColumToDelete

# Drop the column whose name starts with "th"
newdata2 <- mydata[,-ColumnToDelete]
cat("Data frame after droping column whose name is three\n")
newdata2

I hope this helps.
John


________________________________________
From: R-help <r-help-bounces using r-project.org> on behalf of Valentin Petzel <valentin using petzel.at>
Sent: Saturday, January 14, 2023 1:21 PM
To: avi.e.gross using gmail.com
Cc: 'R-help Mailing List'
Subject: Re: [R] Removing variables from data frame with a wile card

Hello Avi,

while something like d$something <- ... may seem like you're directly modifying the data it does not actually do so. Most R objects try to be immutable, that is, the object may not change after creation. This guarantees that if you have a binding for same object the object won't change sneakily.

There is a data structure that is in fact mutable which are environments. For example compare

L <- list()
local({L$a <- 3})
L$a

with

E <- new.env()
local({E$a <- 3})
E$a

The latter will in fact work, as the same Environment is modified, while in the first one a modified copy of the list is made.

Under the hood we have a parser trick: If R sees something like

f(a) <- ...

it will look for a function f<- and call

a <- f<-(a, ...)

(this also happens for example when you do names(x) <- ...)

So in fact in our case this is equivalent to creating a copy with removed columns and rebind the symbol in the current environment to the result.

The data.table package breaks with this convention and uses C based routines that allow changing of data without copying the object. Doing

d[, (cols_to_remove) := NULL]

will actually change the data.

Regards,
Valentin

14.01.2023 18:28:33 avi.e.gross using gmail.com:

> Steven,
>
> Just want to add a few things to what people wrote.
>
> In base R, the methods mentioned will let you make a copy of your original DF that is missing the items you are selecting that match your pattern.
>
> That is fine.
>
> For some purposes, you want to keep the original data.frame and remove a column within it. You can do that in several ways but the simplest is something where you sat the column to NULL as in:
>
> mydata$NAME <- NULL
>
> using the mydata["NAME"] notation can do that for you by using a loop of unctional programming method that does that with all components of your grep.
>
> R does have optimizations that make this less useful as a partial copy of a data.frame retains common parts till things change.
>
> For those who like to use the tidyverse, it comes with lots of tools that let you select columns that start with or end with or contain some pattern and I find that way easier.
>
>
>
> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of Steven Yen
> Sent: Saturday, January 14, 2023 7:49 AM
> To: Andrew Simmons <akwsimmo using gmail.com>
> Cc: R-help Mailing List <r-help using r-project.org>
> Subject: Re: [R] Removing variables from data frame with a wile card
>
> Thanks to all. Very helpful.
>
> Steven from iPhone
>
>> On Jan 14, 2023, at 3:08 PM, Andrew Simmons <akwsimmo using gmail.com> wrote:
>>
>> You'll want to use grep() or grepl(). By default, grep() uses 
>> extended regular expressions to find matches, but you can also use 
>> perl regular expressions and globbing (after converting to a regular expression).
>> For example:
>>
>> grepl("^yr", colnames(mydata))
>>
>> will tell you which 'colnames' start with "yr". If you'd rather you 
>> use globbing:
>>
>> grepl(glob2rx("yr*"), colnames(mydata))
>>
>> Then you might write something like this to remove the columns starting with yr:
>>
>> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]
>>
>>> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen <styen using ntu.edu.tw> wrote:
>>>
>>> I have a data frame containing variables "yr3",...,"yr28".
>>>
>>> How do I remove them with a wild card----something similar to "del yr*"
>>> in Windows/doc? Thank you.
>>>
>>>> colnames(mydata)
>>>   [1] "year"       "weight"     "confeduc"   "confothr" "college"
>>>   [6] ...
>>> [41] "yr3"        "yr4"        "yr5"        "yr6" "yr7"
>>> [46] "yr8"        "yr9"        "yr10"       "yr11" "yr12"
>>> [51] "yr13"       "yr14"       "yr15"       "yr16" "yr17"
>>> [56] "yr18"       "yr19"       "yr20"       "yr21" "yr22"
>>> [61] "yr23"       "yr24"       "yr25"       "yr26" "yr27"
>>> [66] "yr28"...
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fst
>>> at.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7CJSorkin%40so
>>> m.umaryland.edu%7Cca354e487c4e4b977f6b08daf6e2df29%7C717009a620de461
>>> a88940312a395cac9%7C0%7C0%7C638093751546679426%7CUnknown%7CTWFpbGZsb
>>> 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3
>>> D%7C3000%7C%7C%7C&sdata=GP9WF81MtvF%2FYi8LoWQt0W0VInk2WsPAgB0zHsu5aR
>>> Q%3D&reserved=0
>>> PLEASE do read the posting guide
>>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww
>>> .r-project.org%2Fposting-guide.html&data=05%7C01%7CJSorkin%40som.uma
>>> ryland.edu%7Cca354e487c4e4b977f6b08daf6e2df29%7C717009a620de461a8894
>>> 0312a395cac9%7C0%7C0%7C638093751546679426%7CUnknown%7CTWFpbGZsb3d8ey
>>> JWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3
>>> 000%7C%7C%7C&sdata=h6SEOa8rBxjsq%2FQirtXACss4DdfseradQm9FFhDhbVw%3D&
>>> reserved=0 and provide commented, minimal, self-contained, 
>>> reproducible code.
>
>     [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
> .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7CJSorkin%40som.um
> aryland.edu%7Cca354e487c4e4b977f6b08daf6e2df29%7C717009a620de461a88940
> 312a395cac9%7C0%7C0%7C638093751546679426%7CUnknown%7CTWFpbGZsb3d8eyJWI
> joiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7
> C%7C%7C&sdata=GP9WF81MtvF%2FYi8LoWQt0W0VInk2WsPAgB0zHsu5aRQ%3D&reserve
> d=0 PLEASE do read the posting guide 
> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
> -project.org%2Fposting-guide.html&data=05%7C01%7CJSorkin%40som.umaryla
> nd.edu%7Cca354e487c4e4b977f6b08daf6e2df29%7C717009a620de461a88940312a3
> 95cac9%7C0%7C0%7C638093751546679426%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC
> 4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%
> 7C&sdata=h6SEOa8rBxjsq%2FQirtXACss4DdfseradQm9FFhDhbVw%3D&reserved=0
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
> .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7CJSorkin%40som.um
> aryland.edu%7Cca354e487c4e4b977f6b08daf6e2df29%7C717009a620de461a88940
> 312a395cac9%7C0%7C0%7C638093751546679426%7CUnknown%7CTWFpbGZsb3d8eyJWI
> joiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7
> C%7C%7C&sdata=GP9WF81MtvF%2FYi8LoWQt0W0VInk2WsPAgB0zHsu5aRQ%3D&reserve
> d=0 PLEASE do read the posting guide 
> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
> -project.org%2Fposting-guide.html&data=05%7C01%7CJSorkin%40som.umaryla
> nd.edu%7Cca354e487c4e4b977f6b08daf6e2df29%7C717009a620de461a88940312a3
> 95cac9%7C0%7C0%7C638093751546679426%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC
> 4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%
> 7C&sdata=h6SEOa8rBxjsq%2FQirtXACss4DdfseradQm9FFhDhbVw%3D&reserved=0
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7CJSorkin%40som.umaryland.edu%7Cca354e487c4e4b977f6b08daf6e2df29%7C717009a620de461a88940312a395cac9%7C0%7C0%7C638093751546679426%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=GP9WF81MtvF%2FYi8LoWQt0W0VInk2WsPAgB0zHsu5aRQ%3D&reserved=0
PLEASE do read the posting guide https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7CJSorkin%40som.umaryland.edu%7Cca354e487c4e4b977f6b08daf6e2df29%7C717009a620de461a88940312a395cac9%7C0%7C0%7C638093751546679426%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=h6SEOa8rBxjsq%2FQirtXACss4DdfseradQm9FFhDhbVw%3D&reserved=0
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list