[R] Antwort: Antwort: Re: selecting columns from a data frame or data table by type, ie, numeric, integer

G.Maubach at weinwolf.de G.Maubach at weinwolf.de
Wed May 4 10:05:54 CEST 2016


Hi Martin,

many thanks for your answer and your broad explanation. 

I am a newbie to "R" and got help on this list and thought I could give 
something back what looked OK to me.

regarding 0)
You're right, it's pseudo code. I assumed that anybody on the list would 
be able to adapt the code to their needs so that it worked. Next time I 
will post runnable code.

regarding 1)
Your right: "[, i]" is missing. My fault. Sorry.

regarding 3)
I got your point and will do better in the future.

One question: What books do you recommend to read to get to know "R" 
better?

Kind regards

Georg




Von:    Martin Maechler <maechler at stat.math.ethz.ch>
An:     <G.Maubach at weinwolf.de>, 
Kopie:  Carl Sutton <suttoncarl at ymail.com>, "r-help at r-project.org" 
<r-help at r-project.org>
Datum:  04.05.2016 09:05
Betreff:        [R] Antwort: Re: selecting columns from a data frame or 
data table      by type, ie, numeric, integer



>>>>>   <G.Maubach at weinwolf.de>
>>>>>     on Wed, 4 May 2016 08:30:50 +0200 writes:

> Hi All,
> Hi Carl,
> 
> I am not sure if this is useful to you, but I followed your conversation 

> and thought of you when I read this:
> 
> for (i in 1:ncol(dataset)) {
>   if(class(dataset) == "character|numeric|factor|or whatsoever") {
>     dataset[, i] <- as.factor(dataset[, i])
>   }
> }

Ouch -- so many problems in such a short piece of R code !!!

> Source: Zumel, Nina / Mount, John: Practical Data Science with R, 
Manning 
> Publications: Shelter Island, 2014, Chapter 2: Loading data into R, p. 
25

Sorry, but after reading the above, I'd strongly recommend getting
better books about R...
       {{maybe do not take those containing "data science" ;-)}}

Compared to the nice and efficient solution of Bill Dunlap,
the above is really bad-bad-bad  in at least four ways :

0) They way you write it above, you cannot use it,
     <string> == "variant1|variant2|..."
   is pseudocode and does not really work

1) Note the missing "[, i]"  in the 2nd line: It should be
     if(class(dataset[, i]) ...

2) A for loop changing each column at a time is really slow for
   largish data sets

3) [last but not at all least!]
   Please ... many of you readers, do learn:
 
 Using checks such as
       if ( class(x) == "numeric" )
 are (almost) always wrong by design !!!

 Instead you really should (almost) always use

                  if(inherits(x, "numeric"))

Why?  Because classes in R (S3 or S4) can *extend* other classes.
Example: Many of you know that after   fm <- glm(...)
class(fm) is   c("glm", "lm")   and so

    > if(class(fm) == "lm")
    + "yes"
    Warning message:
    In if (class(fm) == "lm") "yes" :
      the condition has length > 1 and only the first element will be used

Similarly, in your case

y <- 1:10
class(y) <- c("myNumber", "numeric")

when that 'y' is a column in your data frame,
the test for  if(class(dataset[,i]) == "numeric")  will *not*
work but actually produce the above warning.

However, one  could als have had

Num <- setClass("Num", contains="numeric")
N <- Num(1:10)

     > Num <- setClass("Num", contains="numeric")
     > N <- Num(1:10)
     > N
     An object of class "Num"
      [1]  1  2  3  4  5  6  7  8  9 10
     > if(class(N) == "numeric") "yes" else "no"
     [1] "no"
     > 

I hope that many of the readers --- including *MANY* authors of
R packages !! --- have understood the above and will fix their R
code -- and even more their books where applicable !!

Martin Maechler,
ETH Zurich & R Core Team 
 
> 


> This way you can select variables of a certain class only and do 
> transformations. I found that this approach is not applicable if used 
with 
> statistical functions like head(). Transformations worked fine for me.
> 
> I found reading the above given source worthwile.
> 
> Kind regards
> 
> Georg
> 
> PS: I am not related to the above given authors. I am just a reader 
> reporting on - at least to me - a valuable ressource.
> 
> 
> 
> Von:    Carl Sutton via R-help <r-help at r-project.org>
> An:     William Dunlap <wdunlap at tibco.com>, 
> Kopie:  "r-help at r-project.org" <r-help at r-project.org>
> Datum:  29.04.2016 22:08
> Betreff:        Re: [R] selecting columns from a data frame or data 
table 
> by type, ie, numeric, integer
> Gesendet von:   "R-help" <r-help-bounces at r-project.org>
> 
> 
> 
> Thank you Bill Dunlap.  So simple I never tried that approach. Tried 
> dozens of others though, read manuals till I was getting headaches, and 
of 
> course the answer was simple when one is competent.   Learning, its a 
> struggle, but slowly getting there.
> Thanks again
>  Carl Sutton CPA
> 
> 
>     On Friday, April 29, 2016 10:50 AM, William Dunlap 
<wdunlap at tibco.com> 
> wrote:
> 
> 
> 
>  > dt1[ vapply(dt1, FUN=is.numeric, FUN.VALUE=NA) ]    a   c1   1 1.12 2 

> 1.0...10 10 0.2
> 
> 
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
> On Fri, Apr 29, 2016 at 9:19 AM, Carl Sutton via R-help 
> <r-help at r-project.org> wrote:
> 
> Good morning RGuru's
> I have a data frame of 575 columns.  I want to extract only those 
columns 
> that are numeric(double) or integer to do some machine learning with.  I 

> have searched the web for a couple of days (off and on) and have not 
found 
> anything that shows how to do this.   Lots of ways to extract rows, but 
> not columns.  I have attempted to use "(x == y)" indices extraction 
method 
> but that threw error that == was for atomic vectors and lists, and I was 

> doing this on a data frame.
> 
> My test code is below
> 
> #  a technique to get column classes
> library(data.table)
> a <- 1:10
> b <- c("a","b","c","d","e","f","g","h","i","j")
> c <- seq(1.1, .2, length = 10)
> dt1 <- data.table(a,b,c)
> str(dt1)
> col.classes <- sapply(dt1, class)
> head(col.classes)
> dt2 <- subset(dt1, typeof = "double" | "numeric")
> str(dt2)
> dt2   #  not subset
> dt2 <- dt1[, list(typeof = "double")]
> str(dt2)
> class_data <- dt1[,sapply(dt1,is.integer) | sapply(dt1, is.numeric)]
> class_data
> sum(class_data)
> typeof(class_data)
> names(class_data)
> str(class_data)
>  Any help is appreciated
> Carl Sutton CPA



More information about the R-help mailing list