Weiwei Shi
helprhelp at gmail.com
Thu Jul 7 21:47:16 CEST 2005
it works.
thanks,
but: (just curious)
why i tried previously and i got
> is.vector(sample.size)
[1] TRUE
i also tried as.vector(sample.size) and assigned it to sampsz,it still
does not work.
On 7/7/05, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
> On 7/7/2005 3:38 PM, Weiwei Shi wrote:
> > Hi there:
> > I have a question on random foresst:
> >
> > recently i helped a friend with her random forest and i came with this problem:
> > her dataset has 6 classes and since the sample size is pretty small:
> > 264 and the class distr is like this (Diag is the response variable)
> > sample.size <- lapply(1:6, function(i) sum(Diag==i))
> >> sample.size
> > [[1]]
> > [1] 36
> >
> > [[2]]
> > [1] 12
> >
> > [[3]]
> > [1] 120
> >
> > [[4]]
> > [1] 36
> >
> > [[5]]
> > [1] 30
> >
> > [[6]]
> > [1] 30
> >
> > I assigned this sample.size to sampsz for a stratiefied sampling
> > purpose and i got the following error:
> > Error in sum(..., na.rm = na.rm) : invalid 'mode' of argument
> >
> > if I use sampsz=c(36, 12, 120, 36, 30, 30), then it is fine. Could you
> > tell me why?
>
> The sum() function knows what to do on a vector, but not on a list. You
> can turn your sample.size variable into a vector using
>
> unlist(sample.size)
>
> Duncan Murdoch
>
> > btw, as to classification problem for this with uneven class number
> > situation, do u have some suggestions to improve its accuracy? I
> > tried to use c() way to make the sampsz works but the result is
> > similar.
> >
> > Thanks,
> >
> > weiwei
> >
> > On 6/30/05, Liaw, Andy <andy_liaw at merck.com> wrote:
> >> The limitation comes from the way categorical splits are represented in the
> >> code: For a categorical variable with k categories, the split is
> >> represented by k binary digits: 0=right, 1=left. So it takes k bits to
> >> store each split on k categories. To save storage, this is `packed' into a
> >> 4-byte integer (32-bit), thus the limit of 32 categories.
> >>
> >> The current Fortran code (version 5.x) by Breiman and Cutler gets around
> >> this limitation by storing the split in an integer array. While this lifts
> >> the 32-category limit, it takes much more memory to store the splits. I'm
> >> still trying to figure out a more memory efficient way of storing the splits
> >> without imposing the 32-category limit. If anyone has suggestions, I'm all
> >> ears.
> >>
> >> Best,
> >> Andy
> >>
> >> > From: Arne.Muller at sanofi-aventis.com
> >> >
> >> > Hello,
> >> >
> >> > I'm using the random forest package. One of my factors in the
> >> > data set contains 41 levels (I can't code this as a numeric
> >> > value - in terms of linear models this would be a random
> >> > factor). The randomForest call comes back with an error
> >> > telling me that the limit is 32 categories.
> >> >
> >> > Is there any reason for this particular limit? Maybe it's
> >> > possible to recompile the module with a different cutoff?
> >> >
> >> > thanks a lot for your help,
> >> > kind regards,
> >> >
> >> >
> >> > Arne
> >> >
> >>
> >
> >
>
>
