[R] party for prediction [REPOST]

Fri Oct 12 18:10:41 CEST 2012

Sorry, my mistake, I didn't get a notification or see it send. Thanks
for clearing that up.

Best wishes

Ed

On 12 October 2012 16:58, David Winsemius <dwinsemius at comcast.net> wrote:
>
> On Oct 12, 2012, at 1:37 AM, Ed wrote:
>
>> Apologies for re-posting, my original message seems to have been
>> overlooked by the moderators.
>>
> No. Your original post _was_ forwarded to the list. On my machine it appeared at October 11, 2012 11:03:08 AM PDT.  No one responded. It seems possible that its lack of data or code is the reason for that state of affairs.
>
> --
> David.
>
>> ---------- Forwarded message ----------
>> From: Ed <icelus2k5 at gmail.com>
>> Date: 11 October 2012 19:03
>> Subject: party for prediction
>> To: R-help at r-project.org
>>
>>
>> Hi there
>>
>> I'm experiencing some problems using the party package (specifically
>> mob) for prediction. I have a real scalar y I want to predict from a
>> real valued vector x and an integral vector z. mob seemed the ideal
>> choice from the documentation.
>>
>> The first problem I had was at some nodes in a partitioning tree, the
>> components of x may be extremely highly correlated or effectively
>> constant (that is x are not independent for all choices of components
>> of z). When the resulting fit is fed into predict() the result is NA -
>> this is not the same behaviour as models returned by say lm which
>> ignore missing coefficients. I have fixed this by defining my own
>> statsModel (myLinearModel - imaginative) which also ignores such
>> coefficients when predicting.
>>
>> The second problem I have is that I get "Cholesky not positive
>> definite" errors at some nodes. I guess this is because of numerical
>> error and degeneracy in the covariance matrix? Any thoughts on how to
>> avoid having this happen would be welcome; it is ignorable though for
>> now.
>>
>> The third and really big problem I have is that when I apply mob to
>> large datasets (say hundreds of thousands of elements) I get a
>> "logical subscript too long" error inside mob_fit_fluctests. It's
>> caught in a try(), and mob just gives up and treats the node as
>> terminal. This is really hurting me though; with 1% of my data I can
>> get a good fit and a worthwhile tree, but with the whole dataset I get
>> a very stunted tree with a pretty useless prediction ability.
>>
>> I guess what I really want to know is:
>> (a) has anyone else had this problem, and if so how did they overcome it?
>> (b) is there any way to get a line or stack trace out of a try()
>> without source modification?
>> (c) failing all of that, does anyone know of an alternative to mob
>> that does the same thing; for better or worse I'm now committed to
>> recursive partitioning over linear models, as per mob?
>> (d) failing all of this, does anyone have a link to a way to rebuild,
>> or locally modify, an R package (preferably windows, but anything
>> would do)?
>>
>> Sorry for the length of this post. If I should RTFM, please point me
>> at any relevant manual by all means. I've spent a few days on this as
>> you can maybe tell, but I'm far from being an R expert.
>>
>> Thanks for any help you can give.
>>
>> Best wishes,
>>
>> Ed
>
> David Winsemius, MD
> Alameda, CA, USA
>