[R] party for prediction [REPOST]
Ed
icelus2k5 at gmail.com
Fri Oct 12 10:37:02 CEST 2012
Apologies for re-posting, my original message seems to have been
overlooked by the moderators.
---------- Forwarded message ----------
From: Ed <icelus2k5 at gmail.com>
Date: 11 October 2012 19:03
Subject: party for prediction
To: R-help at r-project.org
Hi there
I'm experiencing some problems using the party package (specifically
mob) for prediction. I have a real scalar y I want to predict from a
real valued vector x and an integral vector z. mob seemed the ideal
choice from the documentation.
The first problem I had was at some nodes in a partitioning tree, the
components of x may be extremely highly correlated or effectively
constant (that is x are not independent for all choices of components
of z). When the resulting fit is fed into predict() the result is NA -
this is not the same behaviour as models returned by say lm which
ignore missing coefficients. I have fixed this by defining my own
statsModel (myLinearModel - imaginative) which also ignores such
coefficients when predicting.
The second problem I have is that I get "Cholesky not positive
definite" errors at some nodes. I guess this is because of numerical
error and degeneracy in the covariance matrix? Any thoughts on how to
avoid having this happen would be welcome; it is ignorable though for
now.
The third and really big problem I have is that when I apply mob to
large datasets (say hundreds of thousands of elements) I get a
"logical subscript too long" error inside mob_fit_fluctests. It's
caught in a try(), and mob just gives up and treats the node as
terminal. This is really hurting me though; with 1% of my data I can
get a good fit and a worthwhile tree, but with the whole dataset I get
a very stunted tree with a pretty useless prediction ability.
I guess what I really want to know is:
(a) has anyone else had this problem, and if so how did they overcome it?
(b) is there any way to get a line or stack trace out of a try()
without source modification?
(c) failing all of that, does anyone know of an alternative to mob
that does the same thing; for better or worse I'm now committed to
recursive partitioning over linear models, as per mob?
(d) failing all of this, does anyone have a link to a way to rebuild,
or locally modify, an R package (preferably windows, but anything
would do)?
Sorry for the length of this post. If I should RTFM, please point me
at any relevant manual by all means. I've spent a few days on this as
you can maybe tell, but I'm far from being an R expert.
Thanks for any help you can give.
Best wishes,
Ed
More information about the R-help
mailing list