[R] boot package question: sampling on factor, not row

Thomas W Blackwell tblackw at umich.edu
Tue Nov 11 04:43:04 CET 2003


Scott  -

The second argument to  boot(),  called 'statistic', can be
any user-written function you want to cook up, with additional
arguments being passed to it through the '...' mechanism after
all of the named arguments.  (See: `R-intro `Writing your own
functions `The ellipsis argument  for details.)

To carry out your example, I would do something like the following:
(not tested ! use at your own risk.)

my.summary <- function(data, groups, ix, value)
     {	median(aggregate(value, list(ix), mean)[groups[seq(3)]])   }
library("boot")
result <- boot(seq(along=levels(ix)), my.summary, 10000, ix=ix, value=value)

You will note that what  boot()  thinks is the "data" in the
example here is only a vector of sequential integers the same
length as  levels(ix).  This data is ignored in  my.summary()
and the two columns which you show as "ix" and "value" are used
instead.  Furthermore, unless I misunderstand your example, the
mean within each level of "ix" is invariant to which three levels
have been chosen for this particular bootstrap replicate.
Therefore, you could call  aggregate()  only once rather than
10000 times, if you rewrite the function  my.summary()  to use
the result of  aggregate()  rather than call it afresh on every
iteration.

I've given you the reference for the '...' mechanism, because
that reference is almost impossible to find using  help.search().
For the rest of the functions I've used, you're on your own to
look up their help pages.

I *will* comment that I can't see why this particular statistic
is of interest . . . but, I assume you have your own reasons.

HTH  -  tom blackwell  -  u michigan medical school  -  ann arbor  -

On Mon, 10 Nov 2003, Scott Norton wrote:

> Hi all:
>
> I've been looking at the boot package to "bootstrap" sample
> my data in a particular way.  I haven't figured out how to
> set this up using the boot() command and thus have resorted
> to trying to write my own script (although I'd prefer if I
> could get boot() to work for this problem!)
>
> The dataset is set up in the following way:
>
> ix(factor)  value
> 1		5.73
> 1		6.99
> 1		0.32
> 1		4.64
> 1		8.39
> 2		8.47
> 2		1.04
> 2		0.73
> 2		0.29
> 3		6.82
> 3		8.81
> 3		1.33
> 3		9.17
> 3		9.84
> 4		8.57
> 4		5.04
> 4		7.18
> 4		4.54
> 4		4.37
> 5		7.36
> 5		4.97
> 5		2.66
>
> What I would like to do is repeatedly sample the ix (a factor),
> not the individual rows.  For example, say I wanted to repeatedly
> sample (at a sample size of 3) the ix value - e.g. 1,3,5 - then
> average the "value"s within those factors and then lets say take
> the median across this each.
>
> So for a random sample of (1,3,5) that would be:
>
>    median(c(mean(c(5.73,6.99,0.32,4.64,8.39)),
>             mean(6.82,8.81,1.33,9.17,9.84),
>             mean(7.36,4.97,2.66)))
>
> Then repeat this over combinations of 3 ix factors e.g. (1,2,3),
> (1,1,4), etc...
>
> Is it possible to subsample a factor using boot() and then use
> that sample of factors to access rows, rather than directly sample
> rows?
>
> Thanks!!!
> -Scott
>




More information about the R-help mailing list