[R] Removing variables from data frame with a wile card

Steven T. Yen @tyen @end|ng |rom ntu@edu@tw
Mon Feb 13 00:17:07 CET 2023


Thanks Jeff and Andrew. My initial file, mydata, is a data frame with 92 
columns (variables). After the operation (trimming), it remains a data 
frame with 72 variables. So yes indeed, I do not need the drop=FALSE.

> is.data.frame(mydata) [1] TRUE > ncol(mydata) [1] 92 > 
mydata<-mydata[,!grepl("^yr",colnames(mydata)),drop=FALSE] > 
is.data.frame(mydata) [1] TRUE > ncol(mydata) [1] 72

On 2/13/2023 6:57 AM, Jeff Newmiller wrote:
> x["V2"]
>
> is more efficient than using drop=FALSE, and perfectly normal syntax (data frames are lists of columns).  I would ignore the naysayers, or put a comment in if you want to accelerate their uptake.
>
> As I understand it, one of the main reasons tibbles exist is because of drop=TRUE. List-slice (single-dimension) indexing works equally well with both standard and tibble types of data frames.
>
> On February 12, 2023 2:30:15 PM PST, Andrew Simmons<akwsimmo using gmail.com>  wrote:
>> drop = FALSE means that should the indexing select exactly one column, then
>> return a data frame with one column, instead of the object in the column.
>> It's usually not necessary, but I've messed up some data before by assuming
>> the indexing always returns a data frame when it doesn't, so drop = FALSE
>> let's me that I will always get a data frame.
>>
>> ```
>> x <- data.frame(V1 = 1:5, V2 = letters[1:5])
>> x[, "V2"]
>> x[, "V2", drop = FALSE]
>> ```
>>
>> You'll notice that the first returns a character vector, a through e, where
>> the second returns a data frame with one column where the object in the
>> column is the same character vector.
>>
>> You could alternatively use
>>
>> x["V2"]
>>
>> which should be identical to x[, "V2", drop = FALSE], but some people don't
>> like that because it doesn't look like matrix indexing anymore.
>>
>>
>> On Sun, Feb 12, 2023, 17:18 Steven T. Yen<styen using ntu.edu.tw>  wrote:
>>
>>> In the line suggested by Andrew Simmons,
>>>
>>> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]
>>>
>>> what does drop=FALSE do? Thanks.
>>>
>>> On 1/14/2023 8:48 PM, Steven Yen wrote:
>>>
>>> Thanks to all. Very helpful.
>>>
>>> Steven from iPhone
>>>
>>> On Jan 14, 2023, at 3:08 PM, Andrew Simmons<akwsimmo using gmail.com>
>>> <akwsimmo using gmail.com>  wrote:
>>>
>>> You'll want to use grep() or grepl(). By default, grep() uses extended
>>> regular expressions to find matches, but you can also use perl regular
>>> expressions and globbing (after converting to a regular expression).
>>> For example:
>>>
>>> grepl("^yr", colnames(mydata))
>>>
>>> will tell you which 'colnames' start with "yr". If you'd rather you
>>> use globbing:
>>>
>>> grepl(glob2rx("yr*"), colnames(mydata))
>>>
>>> Then you might write something like this to remove the columns starting
>>> with yr:
>>>
>>> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]
>>>
>>> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen<styen using ntu.edu.tw>
>>> <styen using ntu.edu.tw>  wrote:
>>>
>>>
>>> I have a data frame containing variables "yr3",...,"yr28".
>>>
>>>
>>> How do I remove them with a wild card----something similar to "del yr*"
>>>
>>> in Windows/doc? Thank you.
>>>
>>>
>>> colnames(mydata)
>>>
>>>    [1] "year"       "weight"     "confeduc"   "confothr" "college"
>>>
>>>    [6] ...
>>>
>>>   [41] "yr3"        "yr4"        "yr5"        "yr6" "yr7"
>>>
>>>   [46] "yr8"        "yr9"        "yr10"       "yr11" "yr12"
>>>
>>>   [51] "yr13"       "yr14"       "yr15"       "yr16" "yr17"
>>>
>>>   [56] "yr18"       "yr19"       "yr20"       "yr21" "yr22"
>>>
>>>   [61] "yr23"       "yr24"       "yr25"       "yr26" "yr27"
>>>
>>>   [66] "yr28"...
>>>
>>>
>>> ______________________________________________
>>>
>>> R-help using r-project.org  mailing list -- To UNSUBSCRIBE and more, see
>>>
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>>
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>> 	[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org  mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
	[[alternative HTML version deleted]]



More information about the R-help mailing list