[R] any r package can handle factor levels not in the training set

HelponR suncertain at gmail.com
Tue Jan 13 18:14:10 CET 2015


sorry I notice the email subject is not accurate.

to be specific, when I do predict, there are error messages like

factor x has new levels 1, 2

Here x is an attribute(independent var), not outcome.

I wonder if the incremental packages (if any) solve this problem? Maybe it
is time to write my own package.

On Tue, Jan 13, 2015 at 8:59 AM, HelponR <suncertain at gmail.com> wrote:

> Thanks for your reply. But I cannot control the data.
> I am dealing with real world stream data. It is very normal that the test
> data(when you apply model to do prediction) have new values that are not
> seen in training data.
> If I code myself, I would give a random guess or just an intercept for
> such situation. But it seems most R package returns an error and exit.
>
>
> On Mon, Jan 12, 2015 at 6:08 PM, Richard M. Heiberger <rmh at temple.edu>
> wrote:
>
>> You need to define the levels of the training set to include all
>> levels that you might see.
>> Something like this
>>
>> > A <- factor(letters[1:5])
>> > B <- factor(letters[c(1,3,5,7,9)])
>> > A
>> [1] a b c d e
>> Levels: a b c d e
>> > B
>> [1] a c e g i
>> Levels: a c e g i
>> > training <- factor(A, levels=unique(c(levels(A), levels(B))))
>> > training
>> [1] a b c d e
>> Levels: a b c d e g i
>> >
>>
>> In the future please "provide commented, minimal, self-contained,
>> reproducible code."
>>
>> On Mon, Jan 12, 2015 at 9:00 PM, HelponR <suncertain at gmail.com> wrote:
>> > It looks like gbm, glm all has this issue
>> >
>> > I wonder if any R package is immune of this?
>> >
>> > In reality, it is very normal that test data has data unseen in training
>> > data. It looks like I have to give up R?
>> >
>> > Thanks!
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list