[R] party for prediction [REPOST]

Fri Oct 12 17:58:03 CEST 2012

On Oct 12, 2012, at 1:37 AM, Ed wrote:

> Apologies for re-posting, my original message seems to have been
> overlooked by the moderators.
> 
No. Your original post _was_ forwarded to the list. On my machine it appeared at October 11, 2012 11:03:08 AM PDT.  No one responded. It seems possible that its lack of data or code is the reason for that state of affairs.

-- 
David.

> ---------- Forwarded message ----------
> From: Ed <icelus2k5 at gmail.com>
> Date: 11 October 2012 19:03
> Subject: party for prediction
> To: R-help at r-project.org
> 
> 
> Hi there
> 
> I'm experiencing some problems using the party package (specifically
> mob) for prediction. I have a real scalar y I want to predict from a
> real valued vector x and an integral vector z. mob seemed the ideal
> choice from the documentation.
> 
> The first problem I had was at some nodes in a partitioning tree, the
> components of x may be extremely highly correlated or effectively
> constant (that is x are not independent for all choices of components
> of z). When the resulting fit is fed into predict() the result is NA -
> this is not the same behaviour as models returned by say lm which
> ignore missing coefficients. I have fixed this by defining my own
> statsModel (myLinearModel - imaginative) which also ignores such
> coefficients when predicting.
> 
> The second problem I have is that I get "Cholesky not positive
> definite" errors at some nodes. I guess this is because of numerical
> error and degeneracy in the covariance matrix? Any thoughts on how to
> avoid having this happen would be welcome; it is ignorable though for
> now.
> 
> The third and really big problem I have is that when I apply mob to
> large datasets (say hundreds of thousands of elements) I get a
> "logical subscript too long" error inside mob_fit_fluctests. It's
> caught in a try(), and mob just gives up and treats the node as
> terminal. This is really hurting me though; with 1% of my data I can
> get a good fit and a worthwhile tree, but with the whole dataset I get
> a very stunted tree with a pretty useless prediction ability.
> 
> I guess what I really want to know is:
> (a) has anyone else had this problem, and if so how did they overcome it?
> (b) is there any way to get a line or stack trace out of a try()
> without source modification?
> (c) failing all of that, does anyone know of an alternative to mob
> that does the same thing; for better or worse I'm now committed to
> recursive partitioning over linear models, as per mob?
> (d) failing all of this, does anyone have a link to a way to rebuild,
> or locally modify, an R package (preferably windows, but anything
> would do)?
> 
> Sorry for the length of this post. If I should RTFM, please point me
> at any relevant manual by all means. I've spent a few days on this as
> you can maybe tell, but I'm far from being an R expert.
> 
> Thanks for any help you can give.
> 
> Best wishes,
> 
> Ed

David Winsemius, MD
Alameda, CA, USA