[R] Removing variables from data frame with a wile card

Rui Barradas ru|pb@rr@d@@ @end|ng |rom @@po@pt
Sun Jan 15 21:59:04 CET 2023


Às 16:54 de 15/01/2023, Sorkin, John escreveu:
> I am new to this thread. At the risk of presenting something that has been shown before, below I demonstrate how a column in a data frame can be dropped using a wild card, i.e. a column whose name starts with "th" using nothing more than base r functions and base R syntax. While additions to R such as tidyverse can be very helpful, many things that they do can be accomplished simply using base R.
> 
> # Create data frame with three columns
> one <- rep(1,10)
> one
> two <- rep(2,10)
> two
> three <- rep(3,10)
> three
> mydata <- data.frame(one=one, two=two, three=three)
> cat("Data frame with three columns\n")
> mydata
> 
> # Drop the column whose name starts with th, i.e. column three
> # Find the location of the column
> ColumToDelete <- grep("th",colnames((mydata)))
> cat("The colomumn to be dropped is the column called three, which is column",ColumToDelete,"\n")
> ColumToDelete
> 
> # Drop the column whose name starts with "th"
> newdata2 <- mydata[,-ColumnToDelete]
> cat("Data frame after droping column whose name is three\n")
> newdata2
> 
> I hope this helps.
> John
> 
> 
> ________________________________________
> From: R-help <r-help-bounces using r-project.org> on behalf of Valentin Petzel <valentin using petzel.at>
> Sent: Saturday, January 14, 2023 1:21 PM
> To: avi.e.gross using gmail.com
> Cc: 'R-help Mailing List'
> Subject: Re: [R] Removing variables from data frame with a wile card
> 
> Hello Avi,
> 
> while something like d$something <- ... may seem like you're directly modifying the data it does not actually do so. Most R objects try to be immutable, that is, the object may not change after creation. This guarantees that if you have a binding for same object the object won't change sneakily.
> 
> There is a data structure that is in fact mutable which are environments. For example compare
> 
> L <- list()
> local({L$a <- 3})
> L$a
> 
> with
> 
> E <- new.env()
> local({E$a <- 3})
> E$a
> 
> The latter will in fact work, as the same Environment is modified, while in the first one a modified copy of the list is made.
> 
> Under the hood we have a parser trick: If R sees something like
> 
> f(a) <- ...
> 
> it will look for a function f<- and call
> 
> a <- f<-(a, ...)
> 
> (this also happens for example when you do names(x) <- ...)
> 
> So in fact in our case this is equivalent to creating a copy with removed columns and rebind the symbol in the current environment to the result.
> 
> The data.table package breaks with this convention and uses C based routines that allow changing of data without copying the object. Doing
> 
> d[, (cols_to_remove) := NULL]
> 
> will actually change the data.
> 
> Regards,
> Valentin
> 
> 14.01.2023 18:28:33 avi.e.gross using gmail.com:
> 
>> Steven,
>>
>> Just want to add a few things to what people wrote.
>>
>> In base R, the methods mentioned will let you make a copy of your original DF that is missing the items you are selecting that match your pattern.
>>
>> That is fine.
>>
>> For some purposes, you want to keep the original data.frame and remove a column within it. You can do that in several ways but the simplest is something where you sat the column to NULL as in:
>>
>> mydata$NAME <- NULL
>>
>> using the mydata["NAME"] notation can do that for you by using a loop of unctional programming method that does that with all components of your grep.
>>
>> R does have optimizations that make this less useful as a partial copy of a data.frame retains common parts till things change.
>>
>> For those who like to use the tidyverse, it comes with lots of tools that let you select columns that start with or end with or contain some pattern and I find that way easier.
>>
>>
>>
>> -----Original Message-----
>> From: R-help <r-help-bounces using r-project.org> On Behalf Of Steven Yen
>> Sent: Saturday, January 14, 2023 7:49 AM
>> To: Andrew Simmons <akwsimmo using gmail.com>
>> Cc: R-help Mailing List <r-help using r-project.org>
>> Subject: Re: [R] Removing variables from data frame with a wile card
>>
>> Thanks to all. Very helpful.
>>
>> Steven from iPhone
>>
>>> On Jan 14, 2023, at 3:08 PM, Andrew Simmons <akwsimmo using gmail.com> wrote:
>>>
>>> You'll want to use grep() or grepl(). By default, grep() uses
>>> extended regular expressions to find matches, but you can also use
>>> perl regular expressions and globbing (after converting to a regular expression).
>>> For example:
>>>
>>> grepl("^yr", colnames(mydata))
>>>
>>> will tell you which 'colnames' start with "yr". If you'd rather you
>>> use globbing:
>>>
>>> grepl(glob2rx("yr*"), colnames(mydata))
>>>
>>> Then you might write something like this to remove the columns starting with yr:
>>>
>>> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]
>>>
>>>> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen <styen using ntu.edu.tw> wrote:
>>>>
>>>> I have a data frame containing variables "yr3",...,"yr28".
>>>>
>>>> How do I remove them with a wild card----something similar to "del yr*"
>>>> in Windows/doc? Thank you.
>>>>
>>>>> colnames(mydata)
>>>>    [1] "year"       "weight"     "confeduc"   "confothr" "college"
>>>>    [6] ...
>>>> [41] "yr3"        "yr4"        "yr5"        "yr6" "yr7"
>>>> [46] "yr8"        "yr9"        "yr10"       "yr11" "yr12"
>>>> [51] "yr13"       "yr14"       "yr15"       "yr16" "yr17"
>>>> [56] "yr18"       "yr19"       "yr20"       "yr21" "yr22"
>>>> [61] "yr23"       "yr24"       "yr25"       "yr26" "yr27"
>>>> [66] "yr28"...
>>>>
>>>> ______________________________________________
>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7CJSorkin%40som.umaryland.edu%7Cca354e487c4e4b977f6b08daf6e2df29%7C717009a620de461a88940312a395cac9%7C0%7C0%7C638093751546679426%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=GP9WF81MtvF%2FYi8LoWQt0W0VInk2WsPAgB0zHsu5aRQ%3D&reserved=0
>>>> PLEASE do read the posting guide
>>>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7CJSorkin%40som.umaryland.edu%7Cca354e487c4e4b977f6b08daf6e2df29%7C717009a620de461a88940312a395cac9%7C0%7C0%7C638093751546679426%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=h6SEOa8rBxjsq%2FQirtXACss4DdfseradQm9FFhDhbVw%3D&reserved=0
>>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>      [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7CJSorkin%40som.umaryland.edu%7Cca354e487c4e4b977f6b08daf6e2df29%7C717009a620de461a88940312a395cac9%7C0%7C0%7C638093751546679426%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=GP9WF81MtvF%2FYi8LoWQt0W0VInk2WsPAgB0zHsu5aRQ%3D&reserved=0
>> PLEASE do read the posting guide https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7CJSorkin%40som.umaryland.edu%7Cca354e487c4e4b977f6b08daf6e2df29%7C717009a620de461a88940312a395cac9%7C0%7C0%7C638093751546679426%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=h6SEOa8rBxjsq%2FQirtXACss4DdfseradQm9FFhDhbVw%3D&reserved=0
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7CJSorkin%40som.umaryland.edu%7Cca354e487c4e4b977f6b08daf6e2df29%7C717009a620de461a88940312a395cac9%7C0%7C0%7C638093751546679426%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=GP9WF81MtvF%2FYi8LoWQt0W0VInk2WsPAgB0zHsu5aRQ%3D&reserved=0
>> PLEASE do read the posting guide https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7CJSorkin%40som.umaryland.edu%7Cca354e487c4e4b977f6b08daf6e2df29%7C717009a620de461a88940312a395cac9%7C0%7C0%7C638093751546679426%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=h6SEOa8rBxjsq%2FQirtXACss4DdfseradQm9FFhDhbVw%3D&reserved=0
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7CJSorkin%40som.umaryland.edu%7Cca354e487c4e4b977f6b08daf6e2df29%7C717009a620de461a88940312a395cac9%7C0%7C0%7C638093751546679426%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=GP9WF81MtvF%2FYi8LoWQt0W0VInk2WsPAgB0zHsu5aRQ%3D&reserved=0
> PLEASE do read the posting guide https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7CJSorkin%40som.umaryland.edu%7Cca354e487c4e4b977f6b08daf6e2df29%7C717009a620de461a88940312a395cac9%7C0%7C0%7C638093751546679426%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=h6SEOa8rBxjsq%2FQirtXACss4DdfseradQm9FFhDhbVw%3D&reserved=0
> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Hello,

Actually, Bill had addressed this in his post yesterday [1].
With your example,


one <- rep(1,10)
two <- rep(2,10)
three <- rep(3,10)
mydata <- data.frame(one=one, two=two, three=three)

ColumToDelete <- grep("fo",colnames((mydata)))
ColumToDelete
#> integer(0)
ColumToDeleteLogical <- grepl("fo",colnames((mydata)))
ColumToDeleteLogical
#> [1] FALSE FALSE FALSE

# Drop the column whose name starts with "fo"
# empty data.frame
mydata[, -ColumToDelete]
#> data frame with 0 columns and 10 rows

# nothing is deleted
mydata[, !ColumToDeleteLogical]
#>    one two three
#> 1    1   2     3
#> 2    1   2     3
#> 3    1   2     3
#> 4    1   2     3
#> 5    1   2     3
#> 6    1   2     3
#> 7    1   2     3
#> 8    1   2     3
#> 9    1   2     3
#> 10   1   2     3



[1] https://stat.ethz.ch/pipermail/r-help/2023-January/476682.html


Hope this helps,

Rui Barradas



More information about the R-help mailing list