[R] tree problem

Richard Valliant rvalliant at survey.umd.edu
Sun Oct 28 22:49:58 CET 2007


I am trying to use tree to partition a data set. The data set has 3924
observations.  Partitioning seems to work for small subsets of the data,
but when I use the entire data set, no partitioning occurs.  The
variables are:

RESP        respondent to a survey (0 = not a respondent, 1 =
respondent)
AGE_P	  Age (continuous)
ORIGIN_I	  Hispanic Ethnicity (1 = Hispanic, 2 = non-Hispanic)
RACRECI2	Race Recode (1 = White, 2 = Black, 3 = Other)
parents	  Parent(s) present in the family (1 = Yes, 2 = No)
educ	  Education Recode (1 = HS, GED, or less, 5 = some college, 6 =
Bachelor's or AA degree, 9 = Master's & higher

Here are 2 calls to tree and a snip of summary results:

###          Use a sample of 100      ####
> set.seed(331)
> nsize = 100
> sam <- sample(1:nrow(nhis), nsize)
> 
> t1 <- tree( RESP ~ AGE_P + ORIGIN_I + RACRECI2 + parents +  educ, 
+       method = "class",
+       control = tree.control(nobs = nsize, minsize = 10),
+       data = nhis[sam,])
> summary(t1)

Classification tree:
tree(formula = RESP ~ AGE_P + ORIGIN_I + RACRECI2 + parents + 
    educ, data = nhis[sam, ], control = tree.control(nobs = nsize, 
    minsize = 10), method = "class")
Number of terminal nodes:  13             ##### All vars were used


####     Use entire data set      ####
> nsize = 3924
> sam <- sample(1:nrow(nhis), nsize)
> 
> t1 <- tree( RESP ~ AGE_P + ORIGIN_I + RACRECI2 + parents +  educ, 
+       method = "class",
+       control = tree.control(nobs = nsize, minsize = 10),
+       data = nhis[sam,])
> summary(t1)

Classification tree:
.
.
.
Variables actually used in tree construction:
character(0)                       ##### No  vars were used
Number of terminal nodes:  1 

It doesn't matter whether I use the categorical vars as factors or not;
I still get the same results.  As I increase the subsample from 100
incrementally up to 1200 , fewer vars are used in tree construction.  At
1200 the point is reached where none are used.

Is there a way to force tree to do something with the larger sample
sizes and the whole data set?

Package tree version 1.0-26 
R 2.6.0
Windows XP, v.5.1, service pack 2

Thanks
Richard Valliant
U of Maryland US



More information about the R-help mailing list