[R] Dropping columns from data frame

David Winsemius dwinsemius at comcast.net
Fri Jan 6 18:19:18 CET 2012


On Jan 6, 2012, at 11:43 AM, Mike Harwood wrote:

> Thank you, David.  I was merely using "head" to limit the code/
> output.  My question remains, because a created data frame has the
> same columns as was output from "head":
>
>> head(orig.df,3)
>  num1.10 num11.20 lc1.10 lc11.20 uc1.10 uc11.20
> 1       1       11      a       k      A       K
> 2       2       12      b       l      B       L
> 3       3       13      c       m      C       M
>> # Illustration 1: contiguous columns at beginning of data frame
>> head(orig.df[,-c(1:3)],2)
>  lc11.20 uc1.10 uc11.20
> 1       k      A       K
> 2       l      B       L
>> new.df <- orig.df[,-c(1:3)]
>> head(new.df,2)
>  lc11.20 uc1.10 uc11.20
> 1       k      A       K
> 2       l      B       L
>>
>> # Illustration 2: non-contiguous columns
>> head(orig.df[,-c(1,3,5)],2)
>  num11.20 lc11.20 uc11.20
> 1       11       k       K
> 2       12       l       L
>> new.df <- orig.df[,-c(1,3,5)]
>> head(new.df,2)
>  num11.20 lc11.20 uc11.20
> 1       11       k       K
> 2       12       l       L

I guess my short attention span got the better of me. (But calling  
them "unary errors" was somewhat cryptic and not a particularly  
helpful description of what you were actually seeing.)  Here are more  
constructive responses:

Negative indexing is not accepted for character vectors, so you need  
to convert to either numeric or logical and then "negativize":

orig.df[ !names(orig.df) %in% c('num1.10', 'lc1.10', 'uc1.10')]

These are equivalent:

orig.df[ , !names(orig.df) %in% c('num1.10', 'lc1.10', 'uc1.10')]

orig.df[,-match( c("num1.10", "lc1.10", "uc1.10"), names(orig.df))]

orig.df[ , -sapply(c('num1.10', 'lc1.10', 'uc1.10'), grep,  
x=names(orig.df)) ]


And when there is a pattern, such as with your not wanting any of the . 
10" names, then grep can be quite efficient:

orig.df[ , -grep(".10",  names(orig.df), fixed=TRUE)]


-- 
David

>
>
>
>
> On Jan 6, 9:49 am, David Winsemius <dwinsem... at comcast.net> wrote:
>> On Jan 6, 2012, at 10:00 AM, Mike Harwood wrote:
>>
>>> How does R do it, and should I ever be worried?  I always remove
>>> columns by index, and it works exactly as I would naively expect -  
>>> but
>>> HOW?  The second illustration, which deletes non contiguous columns,
>>> represents what I do all the time and have some trepidation about
>>> because I don't know the mechanics (e.g. why doesn't the column
>>> formerly-known-as-4 become 3 after column 1 is dropped: doesn't  
>>> vector
>>> removal from a df/list invoke a loop in C?).
>>
>> You are NOT "removing columns". You are returning (to `head` and then
>> to `print`) an extract from the dataframe, but that does not change
>> the original dataframe. To effect a change you would need to assign
>> the value back to the same name as the original daatframe.
>>
>> --
>> David
>>
>>>  Can I delete a named
>>> list of columns, which are examples 4 and 5 and which generate the
>>> "unary error' mesages, without resorting to "orig.df$num1.10 <-  
>>> NULL"?
>>
>>> Thanks!
>>
>>> orig.df <- data.frame(cbind(
>>>    1:10
>>>    ,11:20
>>>    ,letters[1:10]
>>>    ,letters[11:20]
>>>    ,LETTERS[1:10]
>>>    ,LETTERS[11:20]
>>>    ))
>>> names(orig.df) <- c(
>>>    'num1.10'
>>>    ,'num11.20'
>>>    ,'lc1.10'
>>>    ,'lc11.20'
>>>    ,'uc1.10'
>>>    ,'uc11.20'
>>>    )
>>> # Illustration 1: contiguous columns at beginning of data frame
>>> head(orig.df[,-c(1:3)])
>>
>>> # Illustration 2: non-contiguous columns
>>> head(orig.df[,-c(1,3,5)])
>>
>>> # Illustration 3: contiguous columns at end of data frame
>>> head(orig.df[,-c(4:6)])    ## as expected
>>
>>> # Illustrations 4-5: unary errors
>>> head(orig.df[,-c(as.list('num1.10', 'lc1.10', 'uc1.10'))])
>>> head(orig.df[,-c('num1.10', 'lc1.10', 'uc1.10')])
>>
>>> Mike
>>
>>> ______________________________________________
>>> R-h... at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>> ______________________________________________
>> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/ 
>> listinfo/r-help
>> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list