[R] Very confused with class

Dan Davison davison at stats.ox.ac.uk
Thu Aug 21 17:11:22 CEST 2008


Hi Robin,

You haven't said where you're getting the data from. But if the answer
is that you're using read.table, read.csv or similar to read the data
into R, then I advise you to go back to that stage and get it right
from the outset. It's very, very common to see people who are
relatively new to R splattering their code with calls to as.numeric,
just because they haven't read the data in properly in the first
place. It's also common in those who aren't new to R... So e.g. if you
are using read.table, then use the colClasses argument to specify the
classes of your columns, and use str() on the result until you're
happy with the data frame produced.

It's not entirely clear why you would have ended up with factors if
your data are numeric. That often happens when people mix characters
with numbers. Perhaps you have mixed the header row up with the data?

Anyway, what you are seeing are the integer encodings of the factors. E.g. 

> f <- factor(11:20)
> str(f)
 Factor w/ 10 levels "11","12","13",..: 1 2 3 4 5 6 7 8 9 10
> as.numeric(f)
 [1]  1  2  3  4  5  6  7  8  9 10

But don't mess with them. Just make sure that things which shouldn't
be factors never become factors.

Dan

On Thu, Aug 21, 2008 at 03:40:58PM +0100, Williams, Robin wrote:
> Hi all,
>   I am very confused with class.
>   I am looking at some weather data which I want to use as explanatory
> variables in an lm. R has treated these variables as factors (i.e. with
> different levels), whereas I want them treated as discretely measured
> continuous variables. So I need to reassign the class of these
> variables, right?
> Indeed, doing 
> class(southwest$pressure)
> (pressure being air pressure), I get 
> #> factor.
>   Now what class should I use to reassign them so that my model fitting
> process goes as I want it to? I have obviously done something wrong. I
> did 
> southwest$pressure <- as(southwest$pressure,"numeric")
> numeric seeming like a reasonable class to assign to this variable.
> However, doing some summary stats like 
> mean(southwest$pressure)
> #> 341,
> max(southwest$pressure)
> #> 761,
> which is clearly nonsense, as my maximum value is around 1040. Something
> similar has happened to maxtemp (maximum temperature), which I also
> reassigned from a factor to class numeric, which now apparently has a
> maximum value of 147! 
>   Clearly it must be the reassignment of class that has caused these
> problems, as summary stats on the data before I reassigned the classes
> were fine. What is wrong with the class numeric? Reading the numeric
> help page didn't reveal anything to me. Can someone suggest the correct
> class?
> Many thanks for any help.  
> Robin Williams
> Met Office summer intern - Health Forecasting
> robin.williams at metoffice.gov.uk 
>  
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
http://www.stats.ox.ac.uk/~davison



More information about the R-help mailing list