[R] Bootstrapping in R

ruipbarradas at sapo.pt ruipbarradas at sapo.pt
Mon Oct 3 09:56:15 CEST 2016


Hello,

I've just ran your code and it all went well.
So my doubt is: if you have 1269 rows why choose only 100 and  
bootstrap? It doesn't seem to make much sense to me.
Try to run the entire df through DataSummary  and compare the results  
with the bootstrap results.

Rui Barradas
 

Citando Bryan Mac <bryanmac.24 at gmail.com>:

> Hi all,  
> Here is the first six rows of my data. In total I have 1269 rows.  
> My goal is to get conduct nonparametric bootstrap and case resampling. 
> I would like to randomly select 100 out of the 1269 After that, I  
> wish to bootstrap that randomly selected 100 out of 1269.
>  
> I assume I need to set the seed to conduct this randomization, as  
> with bootstrapping you would get varied results each time the code  
> is run.
>  
> ##   NAR  SQRTNAR NIC  SQRTNIC ## 1 2.6 1.612452 5.6 2.366432 ## 2  
> 8.1 2.846050 9.9 3.146427 ## 3 5.7 2.387467 7.1 2.664583 ## 4 8.3  
> 2.880972 8.1 2.846050 ## 5 7.3 2.701851 9.9 3.146427 ## 6 4.9  
> 2.213594 8.6 2.932576
> Here is my definition of the DataSummary function.
>  
> DataSummary <- function(df, indices){  sample <- df[indices, ]      
> sumry_for_NAR <- summary(sample$NAR)  nms <- names(sumry_for_NAR)   
> nms <- c(nms, 'std')  out_for_NAR <- c(sumry_for_NAR,  
> sd(sample$NAR))  names(out_for_NAR) <- nms     sumry_for_SQRTNAR <-  
> summary(sample$SQRTNAR)  nms <- names(sumry_for_SQRTNAR)  nms <-  
> c(nms, 'std')  out_for_SQRTNAR <- c(sumry_for_SQRTNAR,  
> sd(sample$SQRTNAR))  names(out_for_SQRTNAR) <- nms     sumry_for_NIC  
> <- summary(sample$NIC)  nms <- names(sumry_for_NIC)  nms <- c(nms,  
> 'std')  out_for_NIC <- c(sumry_for_NIC, sd(sample$NIC))   
> names(out_for_NIC) <- nms     sumry_for_SQRTNIC <-  
> summary(sample$SQRTNIC)  nms <- names(sumry_for_SQRTNIC)  nms <-  
> c(nms, 'std')  out_for_SQRTNIC <- c(sumry_for_SQRTNIC,  
> sd(sample$SQRTNIC))  names(out_for_SQRTNIC) <- nms     OUT <-  
> c(out_for_NAR, out_for_SQRTNAR, out_for_NIC, out_for_SQRTNIC)      
> return(OUT)} Again, here is my attempt at bootstrapping.
>
>  
> result <- boot(n_data, statistic = DataSummary, R = 100)result
>  
>  Per suggestions, would I go with this code to achieve my goal?  So,  
> the best reference/resource is the boot help page. I found code  
> through various sites and I got really confused because they were  
> very different from each other.
>  
>
>> set.seed(1007)
>>
>> x <- rnorm(100)
>> y <- x + rnorm(100)
>> dat <- data.frame(x, y)
>
>> stat2 <- function(DF, f){
>> model <- lm(y ~ x, data = DF[f,])
>> coef(model)
>> }
>>
>> boot(dat, stat1, R = 100)
>> boot(dat, stat2, R = 100)
>
>  
>
>  
>   Bryan Mac
> bryanmac.24 at gmail.com
>  
>
>  
>
>> On Oct 2, 2016, at 5:37 AM, ruipbarradas at sapo.pt wrote:
>>    Right.
>> To see it in action just compare the results of the two calls to boot.
>>
>> library(boot)
>>
>> set.seed(1007)
>>
>> x <- rnorm(100)
>> y <- x + rnorm(100)
>> dat <- data.frame(x, y)
>>
>> #Wrong
>> stat1 <- function(DF, f){
>> model <- lm(DF$y ~ DF$x, data = DF[f,])  #Doesn't bootstrap DF
>> coef(model)
>> }
>>
>> #Correct
>> stat2 <- function(DF, f){
>> model <- lm(y ~ x, data = DF[f,])
>> coef(model)
>> }
>>
>> boot(dat, stat1, R = 100)
>> boot(dat, stat2, R = 100)
>>
>> Rui Barradas
>>
>> Citando peter dalgaard <pdalgd at gmail.com>:
>>  
>>>> On 01 Oct 2016, at 16:11 , Daniel Nordlund <djnordlund at gmail.com> wrote:
>>>>
>>>> You haven't told us anything about the structure of your data, or  
>>>> the definition of the DataSummary function.
>>>
>>> Yes. Just let me add that a common error with boot() is not to pay  
>>> attention to the required form of the statistic= function  
>>> argument. It should depend on the data and a set of indices and  
>>> (for nonparametic bootstrap) it is the indices that are random.
>>>
>>> Typical mistakes are to completely ignore the index argument, or  
>>> to write clumsy code that ignores the data specification, as in
>>> coef(lm(df$y~df$x, data=d[f])).
>>>
>>> --
>>> Peter Dalgaard, Professor,
>>> Center for Statistics, Copenhagen Business School
>>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>>> Phone: (+45)38153501
>>> Office: A 4.23
>>> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide  
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>  

 

	[[alternative HTML version deleted]]



More information about the R-help mailing list