[R] Discretize factors?

Sun May 16 20:24:55 CEST 2010

I could, but with close to 100 columns, its messy.

On 5/16/10 11:22 AM, Peter Ehlers wrote:
> On 2010-05-16 11:06, Noah Silverman wrote:
>> Update,
>>
>> I have it working, but now its producing really ugly labels.  Must be a
>> small adjustment to the code.  Any ideas??
>>
>> ##Create example data.frame
>> group<- c("A", "B","B","C","C","C")
>> a<- c(1,4,3,4,5,6)
>> b<- c(5,4,5,3,4,5)
>> d<- data.frame(cbind(a,b,group))
>>
>> #create new frame with discretized group
>>> cbind(d[,1:2], model.matrix(~0+d[,3]) )
>>    a b d[, 3]A d[, 3]B d[, 3]C
>> 1 1 5       1       0       0
>> 2 4 4       0       1       0
>> 3 3 5       0       1       0
>> 4 4 3       0       0       1
>> 5 5 4       0       0       1
>> 6 6 5       0       0       1
>>
>>
>> So, as you can see, it works, but the labels for the groups don't
>>
>> I then tried using the column name instead of number and still got ugly
>> results:
>>
>>> cbind(d[,1:2], model.matrix(~0+d[,"group"]) )
>>    a b d[, "group"]A d[, "group"]B d[, "group"]C
>> 1 1 5             1             0             0
>> 2 4 4             0             1             0
>> 3 3 5             0             1             0
>> 4 4 3             0             0             1
>> 5 5 4             0             0             1
>> 6 6 5             0             0             1
>>
>>
>>
>> Any ideas?
>>
>
> Can't you just use names(...) <- c() on your final dataframe?
>
>  -Peter Ehlers
>
>> -N
>>
>>
>>
>> On 5/15/10 11:02 AM, Noah Silverman wrote:
>>> Hi,
>>>
>>> I'm looking for an easy way to discretize factors in R
>>>
>>> I've noticed that the lm function does this automatically with a nice
>>> result.
>>>
>>> If I have
>>>
>>> group<- c("A", "B","B","C","C","C")
>>>
>>> and run:
>>>
>>> lm(result ~ x1 + group)
>>>
>>> The lm function has split the group into separate binary variables
>>> {0,1}
>>> before performing the regression.  I now have:
>>> groupA
>>> groupB
>>> groupC
>>>
>>> Some of the other models that I want to try won't accept factors, so
>>> they need to be discretized this way.
>>>
>>> Is there a command in R for this, or some easy shortcut?  (I tried
>>> digging into the lm code, but couldn't find where this is being done.)
>>>
>>> Thanks!
>>>
>>> -N
>>>