[R] CART for 0/1 data

Dave Roberts droberts at montana.edu
Fri Sep 23 19:19:42 CEST 2005


Martin,

     Sorry, I don't think I read your message carefully enough.

     When you say the error message is "+", that woudl seem to indicate 
that you still had an unclosed parenthesis and that the function was 
looking for more input.

     Using a smaller data set (160 samples, 169 rows, only 5 classes) it 
did work fine for me.  pa = presence/absence dataframe, opt.5$clustering 
= cluster IDs.

*********************************************************************

 > test <- tree(factor(opt.5$clustering)~pa)
 > test
node), split, n, deviance, yval, (yprob)
       * denotes terminal node

  1) root 160 371.000 3 ( 0.23750 0.08750 0.57500 0.07500 0.02500 )
    2) pa.symore < 0.5 79 216.500 1 ( 0.48101 0.17722 0.15190 0.13924 
0.05063 )
      4) pa.artarb < 0.5 42 123.600 2 ( 0.07143 0.33333 0.26190 0.23810 
0.09524 )
        8) pa.macgri < 0.5 31  75.280 2 ( 0.09677 0.45161 0.00000 
0.32258 0.12903 )
    .        .         .
    .        .         .
    .        .         .
    3) pa.symore > 0.5 81  10.780 3 ( 0.00000 0.00000 0.98765 0.01235 
0.00000 )
      6) pa.carrss < 0.5 11   6.702 3 ( 0.00000 0.00000 0.90909 0.09091 
0.00000 ) *
      7) pa.carrss > 0.5 70   0.000 3 ( 0.00000 0.00000 1.00000 0.00000 
0.00000 ) *

************************************************************************

I'll try agin with a larger dataset and see if it's a memory limitation.

Dave Roberts

Martin Wegmann wrote:
> On Friday 23 September 2005 17:08, Dave Roberts wrote:
> 
>>Martin,
>>
>>     If the data are actually coded 0/1, the tree function would
>>probably intepret them as integers and try a regression instead of a
>>classification.  If the dependent variable is called "var", try
> 
> 
> thanks, but I think I provided too less informations. 
> My dependent variable are the locations which are names (I could transform 
> them to numbers from 1 - n). The independent variables consist of 0/1 data 
> (species). 
> If I do 
> tree(locations~factor(species1)+factor(species2)+.....+factor(speciesn), 
> sp_data) 
> I receive the same results as without the factor() part. 
> BTW just a subset of the locations are displayed what is pretty weird 
> considering that I included all locations in the analysis.
> 
> Martin 
> 
> 
> 
>>x <- tree(factor(var)~species)
>>
>>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>David W. Roberts                                     office 406-994-4548
>>Professor and Head                                      FAX 406-994-3190
>>Department of Ecology                         email droberts at montana.edu
>>Montana State University
>>Bozeman, MT 59717-3460
>>
>>Martin Wegmann wrote:
>>
>>>Dear R-user,
>>>
>>>I tried to generate classification / regression tree with a
>>>absence/presence matrix of species (400) in different locations (50) to
>>>visualise species which are important for splitting up two locations.
>>>Rpart and tree did not work for more than 10 species which is logical due
>>>to the limited amount of locations (n=50). However the error prompt is a
>>>"+" and no specific message, but I am pretty sure that I did not enter a
>>>false sign by mistake.
>>>Is it allowed at all to use 0/1 data for this statistical technique and
>>>if yes is there a way or different method to use all 400 species entries?
>>>Otherwise I would apply a PCA beforehand but I would prefer to have the
>>>raw species informations.
>>>
>>>using R 2.1.1-1 (debian repos.)
>>>
>>>regards, Martin
>>
>>______________________________________________
>>R-help at stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide!
>>http://www.R-project.org/posting-guide.html
> 
>




More information about the R-help mailing list