[R] rpart results - problem after oversampling

Jonathan P Daily jdaily at usgs.gov
Thu Dec 2 14:14:03 CET 2010


Short answer: Not really.

Slightly longer answer: from what I remember of partitioning methods, a 
given split is made at either a single observation or between two 
consecutive observations. In your case, I estimate rpart would have split 
on that particular point... except that there are 6 of them now. So the 
choice of which 6 to split is arbitrary. (Someone with more knowledge of 
rpart's guts feel free to correct me).
--------------------------------------
Jonathan P. Daily
Technician - USGS Leetown Science Center
11649 Leetown Road
Kearneysville WV, 25430
(304) 724-4480
"Is the room still a room when its empty? Does the room,
 the thing itself have purpose? Or do we, what's the word... imbue it."
     - Jubal Early, Firefly

r-help-bounces at r-project.org wrote on 12/02/2010 05:34:23 AM:

> [image removed] 
> 
> [R] rpart results - problem after oversampling
> 
> Mederos, Vicente (Santander) 
> 
> to:
> 
> r-help at r-project.org
> 
> 12/02/2010 05:36 AM
> 
> Sent by:
> 
> r-help-bounces at r-project.org
> 
> Hi all,
> 
> I am trying to predict a target variable that takes values 0 or 1 
> using the rpart command. In my initial dataset I have few positive 
> observations of the target variable; therefore I have oversampled 
> the rare event by a multiple of 6 (i.e. from 762 to 4572).
> 
> However, in my results, I end up with a number of positives in one 
> of the terminal nodes that is not divisible by 6. As I have the same
> observation repeated 6 times, shouldn't all of them follow the same 
> branch of the tree and go to the same terminal node?
> 
> Thanks for your help,
> 
> Vicente
> Emails aren't always secure, and they may be intercepted or changed
> after they've been sent. Santander doesn't accept liability if this
> happens. If you think someone may have interfered with this email,
> please get in touch with the sender another way. This message doesn't
> create or change any contract. Santander doesn't accept responsibility
> for damage caused by any viruses contained in this email or its
> attachments. Emails may be monitored. If you've received this email by
> mistake, please let the sender know at once that it's gone to the wrong
> person and then destroy it without copying, using, or telling anyone
> about its contents. Santander UK plc (SAN UK) Reg. No. 2294747 and Abbey
> National Treasury Services plc (ANTS) Reg. No. 2338548 are registered in
> England and have their Registered Offices at 2 Triton Square, Regent's
> Place, London, NW1 3AN. www.santander.co.uk SAN UK and ANTS are
> authorised and regulated by the Financial Services Authority (Reg. No.
> 106054 and 146003 respectively). SAN UK advises on mortgages, a limited
> range of life assurance, pension and collective investment scheme
> products and acts as an insurance intermediary for general insurance.
> Santander and the flame logo are registered trademarks.
> Ref:[PDB#1-4]
> 
>    [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list