[R] Recursive partitioning algorithms in R vs. alia

Tobias Verbeke tobias.verbeke at openanalytics.be
Sat Jun 20 08:04:21 CEST 2009


Wensui Liu wrote:

> well, how difficult to code random forest with sas macro + proc split?
> if you are lack of sas programming skill, then you are correct that
> you have to wait for 8 years :-)

It is true one can use the macro language to obtain some control flow 
the plain SAS language and its PROCs are missing and for manipulating 
matrices there is even a third language (IML), but my customers prefer 
to leverage community-tested open source implementations as building 
blocks rather than spending unnecessary resources in writing things from 
scratch in their corner.

> i don't know how much sas experience you have. as far as i know, both
> bagging and boosting have been implemented in sas em for a while,
> together with other cut-edge modeling tools such as svm / nnet.

Fair enough, but whenever you will need ensemble methods for survival 
data or would like to escape bias in variable importance in presence
of categorical predictors you will (1) not be able to take something off 
the shelf and (2) neither to programmatically tweak SAS EM procedures
(as they are not exposed but locked in the GUI), so there again your
only option is to implement things from scratch.

Best,
Tobias

> On Fri, Jun 19, 2009 at 4:18 PM, Tobias
> Verbeke<tobias.verbeke at openanalytics.be> wrote:
>> Wensui Liu wrote:
>>
>>> in terms of the richness of features and ability to handle large
>>> data(which is normal in bank), SAS EM should be on top of others.
>> Should be ? That is not at all my experience.
>> SAS EM is very much lagging behind current
>> research. You will find variants of random forests
>> in R that will not be in SAS for the next 8 years,
>> to give just one example.
>>
>>> however, it is not cheap.
>>> in terms of algorithm, split procedure in sas em can do
>>> chaid/cart/c4.5, if i remember correctly.
>> These are techniques of the 80s and 90s
>> (which proves my point). CART is in rpart and
>> an implementation of C4.5 can be accessed
>> through RWeka. For the oldest one (CHAID, 1980),
>> there might be an implementation soon:
>>
>> http://r-forge.r-project.org/projects/chaid/
>>
>> but again there have been quite some improvements
>> in the last decade as well:
>>
>> http://cran.r-project.org/web/views/MachineLearning.html
>>
>> HTH,
>> Tobias
>>
>>> On Fri, Jun 19, 2009 at 2:35 PM, Carlos J. Gil
>>> Bellosta<cgb at datanalytics.com> wrote:
>>>> Dear R-helpers,
>>>>
>>>> I had a conversation with a guy working in a "business intelligence"
>>>> department at a major Spanish bank. They rely on recursive partitioning
>>>> methods to rank customers according to certain criteria.
>>>>
>>>> They use both SAS EM and Salford Systems' CART. I have used package R
>>>> part in the past, but I could not provide any kind of feature comparison
>>>> or the like as I have no access to any installation of the first two
>>>> proprietary products.
>>>>
>>>> Has anybody experience with them? Is there any public benchmark
>>>> available? Is there any very good --although solely technical-- reason
>>>> to pay hefty software licences? How would the algorithms implemented in
>>>> rpart compare to those in SAS and/or CART?
>>>>
>>>> Best regards,
>>>>
>>>> Carlos J. Gil Bellosta
>>>> http://www.datanalytics.com
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>
> 
> 
>




More information about the R-help mailing list