[R] factor : how does it work ?

Duncan Murdoch murdoch at stats.uwo.ca
Thu Oct 6 16:57:45 CEST 2005


On 10/6/2005 10:50 AM, Florence Combes wrote:
> a last question, and thanks a million for your patience and your
> explanations ...
> 
> 
> I tried with a df called "merged" and a column named "Pcc_0h_A" (which is
> numeric values):
> 
>> length(as.vector(merged$Pcc_0h_A))
> [1] 12202
>>as.numeric(as.vector(merged$Pcc_0h_A)[1:10])
> [1] 12.276 11.958 14.098 13.843 12.451 11.745 NA NA NA NA
>> ord<-ordered(merged$Pcc_0h_A)
>> length(ord)
> [1] 12202
>> ord[1:10]
> [1] 12.276 11.958 14.098 13.843 12.451 11.745 <NA> <NA> <NA> <NA>
> 5386 Levels: 10.001 < 10.002 < 10.003 < 10.005 < 10.006 < 10.010 < ... <
> 9.999
> 
> here I have <NA> instead of NA because ord is a factor and the notation is
> different ?

I can't tell what's going on here.  Since you are only showing me 
converted values of each column (as.vector(), as.numeric(), ordered(), 
etc.) I can't tell what the original looked like.

A useful way to get an overview of a dataframe is to look at the results 
of three function calls:

head(merged)    # list the first few rows
str(merged)	# describe the structure of the dataframe
summary(merged) # summarize the data in each of the columns.

Duncan Murdoch
> 
>> length(as.numeric(merged$Pcc_0h_A))
> [1] 12202
>> as.numeric(merged$Pcc_0h_A[1:10])
> [1] 1812 1547 3308 3114 1960 1370 NA NA NA NA
> 
> are these the levels names converted into numbers ? I don't think because
> levels are like 10.001, 10.002 etc and 1812, 1547 etc are not in this form.
> 
> thanks a million
> 
> florence;
> 
> 
> 
> 
> On 10/6/05, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
>>
>> On 10/6/2005 10:20 AM, Florence Combes wrote:
>> >> > > 2d I can't manage to deal with factors, so when I have some, I
>> >> transform
>> >> > > them in vectors (with levels()), but I think I miss the power and
>> >> utility
>> >> > of
>> >> > > the factor type ?
>> >> >
>> >> > levels() is not the conversion you want.
>> >
>> >
>> > in fact I use
>> > 'as.numeric(levels(f))[f]'
>> > (from the ?factor description)
>>
>> That will only work if the levels have names that can be converted to
>> numbers. In the example below, the levels are "a" and "b", so you'll
>> get NA values if you try this.
>> >
>> > That lists all the levels, but
>> >> > it doesn't tell you how they correspond to individual observations.
>> For
>> >> > example,
>> >> >
>> >> > > df <- data.frame(x=1:3, y=c('a','b','a'))
>> >> > > df
>> >> > x y
>> >> > 1 1 a
>> >> > 2 2 b
>> >> > 3 3 a
>> >> > > levels(df$y)
>> >> > [1] "a" "b"
>> >> >
>> >> > If you need to convert back to character values, use as.character():
>> >> >
>> >> > > as.character(df$y)
>> >> > [1] "a" "b" "a"
>> >
>> >
>> > got it.
>> >
>> >
>> >> > 1. You can't compare the levels of a factor unless you declared it to
>> >> > be ordered:
>> >> >
>> >> > > df$y[1] > df$y[2]
>> >> > [1] NA
>> >> > Warning message:
>> >> > > not meaningful for factors in: Ops.factor(df$y[1], df$y[2])
>> >> >
>> >> > but
>> >> >
>> >> > > df$y <- ordered(df$y)
>> >> > > df$y[1] > df$y[2]
>> >> > [1] FALSE
>> >> >
>> >> > However, you need to watch out here: the comparison is done by the
>> order
>> >> > of the factors
>> >
>> >
>> > I am sorry I don't understand this.
>> > here you compare the position of a in the factor and the position of b
>> in
>> > the factor ?
>>
>> It's the position of "a" in the levels() vector that is being compared.
>> I declared that the factor had ordered levels, and R interprets that
>> to mean that the first level is less than the second level, etc. This
>> is useful if you want to use meaningful names for ordered categories.
>> Comparison will be by the order of the categories, not by the name you
>> chose.
>>
>> Duncan Murdoch
>>
>> >
>> > , not an alphabetic comparison of their names:
>> >> >
>> >> > > levels(df$y) <- c("before", "after")
>> >> > > df
>> >> > x y
>> >> > 1 1 before
>> >> > 2 2 after
>> >> > 3 3 before
>> >> > > df$y[1] > df$y[2]
>> >> > [1] FALSE
>> >
>> >
>> > best regards,
>> >
>> > florence.
>> >
>>
>>
>




More information about the R-help mailing list