[R] question about categorical variables in R
drjimlemon at gmail.com
Sat Sep 12 07:12:36 CEST 2015
Given that this is such a common question and the R FAQ doesn't really
answer it, perhaps a brief explanation will help. In R the factor class is
a sort of combination of the literal representation of the data and a
sequence of numbers beginning at 1 that are alphabetically ordered by
default. For example, suppose you read in what you think are a set of
numbers like this:
x<-read.table(text="1 2 3
+ 4 5 6
+ 7 . 9")
V1 V2 V3
1 1 2 3
2 4 5 6
3 7 . 9
Now look at the classes of the columns:
V1 V2 V3
"integer" "factor" "integer"
Somehow that second column has become a factor. This is because "." cannot
be represented as a number and I didn't tell R that it should be regarded
as a missing value (na.strings="."). R has taken the literal values in that
 "." "2" "5"
and attached numbers to those values their alphabetic order.
 2 3 1
You can get the original numbers back like this:
 2 5 NA
NAs introduced by coercion
and R helpfully tells you that it couldn't coerce "." to a number.
In your example, the factor is created for you
 male female
Levels: female male
but as you can see, the default order of the factor may not be what you
 2 1
For a more complete account of factors, see "An Introduction to R" section
4 "Ordered and unordered factors".
On Sat, Sep 12, 2015 at 12:45 AM, Lida Zeighami <lid.zigh at gmail.com> wrote:
> Hi dear experts,
> I have a general question in R, about the categorical variable such as
> Gender(Male or Female)
> If I have this column in my data and wanted to do regression model or feed
> the data to seqmeta packages (singlesnp, skat meta) , would you please let
> me know should I code them first ( male=0 and female=1) or R programming do
> it for me?
> Because when I didn't code them, R still can do the analysis without any
> error but I'm not sure it's correct or not?
> [[alternative HTML version deleted]]
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
More information about the R-help