[R] tree problem

Richard Valliant rvalliant at survey.umd.edu
Wed Oct 31 19:16:35 CET 2007


This is a repeat posting from 28 Oct that generated no replies. I'm
hoping someone has some advice since I'm still stuck ...

I am trying to use tree to partition a data set. The data set has 3924
observations. Partitioning works for small subsets of the data, but when
I use the entire data set, no partitioning occurs. The variables are:

RESP respondent to a survey (0 = not a respondent, 1 = respondent)
AGE_P Age (continuous)
ORIGIN_I Hispanic Ethnicity (1 = Hispanic, 2 = non-Hispanic)
RACRECI2Race Recode (1 = White, 2 = Black, 3 = Other)
parents Parent(s) present in the family (1 = Yes, 2 = No)
educ Education Recode (1 = HS, GED, or less, 5 = some college, 6 =
Bachelor's or AA degree, 9 = Masters & higher

Here are 2 calls to tree and a snip of summary results:

### Use a sample of 100 ####
> set.seed(331)
> nsize = 100
> sam <- sample(1:nrow(nhis), nsize)
> 
> t1 <- tree( RESP ~ AGE_P + ORIGIN_I + RACRECI2 + parents + educ, 
+ method = "class",
+ control = tree.control(nobs = nsize, minsize = 10),
+ data = nhis[sam,])
> summary(t1)

Classification tree:
tree(formula = RESP ~ AGE_P + ORIGIN_I + RACRECI2 + parents + 
educ, data = nhis[sam, ], control = tree.control(nobs = nsize, 
minsize = 10), method = "class")
Number of terminal nodes: 13              ##### All vars were used


#### Use entire data set ####
> nsize = 3924
> sam <- sample(1:nrow(nhis), nsize)
> 
> t1 <- tree( RESP ~ AGE_P + ORIGIN_I + RACRECI2 + parents + educ, 
+ method = "class",
+ control = tree.control(nobs = nsize, minsize = 10),
+ data = nhis[sam,])
> summary(t1)

Classification tree:
.
.
.
Variables actually used in tree construction:
character(0)                                       ##### No vars were
used
Number of terminal nodes: 1 

It doesn't matter whether I use the categorical vars as factors or not;
I still get the same results. As I increase the subsample from 100
incrementally up to 1200 , fewer vars are used in tree construction. At
1200 the point is reached where none are used.

Is there a way to force tree to do something with the larger sample
sizes and the whole data set?

Package tree version 1.0-26 
R 2.6.0
Windows XP, v.5.1, service pack 2

Thanks
Richard Valliant
U of Maryland, US



More information about the R-help mailing list