[R] GLM: What is a good way for dealing with new factor levels in the test set?

thuksu toby at huksu.com
Thu Apr 30 00:05:03 CEST 2015


My training set and my test set have some factor levels that are
different....  It's rare, but it occurs.

What is a good way for dealing with this?

I don't want to throw away the entire row from the data frame, because there
is some valuable information in there.

Is there some way to say something like "use the weighted average
coefficient level for this factor"?



--
View this message in context: http://r.789695.n4.nabble.com/GLM-What-is-a-good-way-for-dealing-with-new-factor-levels-in-the-test-set-tp4706621.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list