[R] CART for 0/1 data

Dave Roberts droberts at montana.edu
Fri Sep 23 19:25:28 CEST 2005


     I should have tried before the last post to save postings, but on 
my machine I tried samples = 1224, species = 962, clusters = 10 with no 
problems at all.

 > summary(test)

Classification tree:
tree(formula = factor(opt.10$clustering) ~ pa)
Variables actually used in tree construction:
  [1] "pa.PICENG" "pa.ARTTSV" "pa.PSEMEN" "pa.AGRSPI" "pa.DESCES" 
  [7] "pa.FESIDA" "pa.POLBIS" "pa.CAREXX" "pa.PINCON" "pa.GEUMAC"
Number of terminal nodes:  16
Residual mean deviance:  1.551 = 1873 / 1208
Misclassification error rate: 0.2435 = 298 / 1224

You may want to reclassify to fewer than 50 locations, but I think it 
should work.

Good luck, Dave Roberts

Martin Wegmann wrote:
> On Friday 23 September 2005 17:08, Dave Roberts wrote:
>>     If the data are actually coded 0/1, the tree function would
>>probably intepret them as integers and try a regression instead of a
>>classification.  If the dependent variable is called "var", try
> thanks, but I think I provided too less informations. 
> My dependent variable are the locations which are names (I could transform 
> them to numbers from 1 - n). The independent variables consist of 0/1 data 
> (species). 
> If I do 
> tree(locations~factor(species1)+factor(species2)+.....+factor(speciesn), 
> sp_data) 
> I receive the same results as without the factor() part. 
> BTW just a subset of the locations are displayed what is pretty weird 
> considering that I included all locations in the analysis.
> Martin 
>>x <- tree(factor(var)~species)
>>David W. Roberts                                     office 406-994-4548
>>Professor and Head                                      FAX 406-994-3190
>>Department of Ecology                         email droberts at montana.edu
>>Montana State University
>>Bozeman, MT 59717-3460
>>Martin Wegmann wrote:
>>>Dear R-user,
>>>I tried to generate classification / regression tree with a
>>>absence/presence matrix of species (400) in different locations (50) to
>>>visualise species which are important for splitting up two locations.
>>>Rpart and tree did not work for more than 10 species which is logical due
>>>to the limited amount of locations (n=50). However the error prompt is a
>>>"+" and no specific message, but I am pretty sure that I did not enter a
>>>false sign by mistake.
>>>Is it allowed at all to use 0/1 data for this statistical technique and
>>>if yes is there a way or different method to use all 400 species entries?
>>>Otherwise I would apply a PCA beforehand but I would prefer to have the
>>>raw species informations.
>>>using R 2.1.1-1 (debian repos.)
>>>regards, Martin
>>R-help at stat.math.ethz.ch mailing list
>>PLEASE do read the posting guide!

David W. Roberts                                     office 406-994-4548
Professor and Head                                      FAX 406-994-3190
Department of Ecology                         email droberts at montana.edu
Montana State University
Bozeman, MT 59717-3460

More information about the R-help mailing list