[R] Reproducibility issue in gbm (32 vs 64 bit)

Patrick Connolly p_connolly at slingshot.co.nz
Thu Mar 3 09:50:43 CET 2011


On Sat, 26-Feb-2011 at 08:46AM -0800, Ridgeway, Greg wrote:

|> I have heard about this before happening on other
|> platforms. Frankly I'm not positive how this happens. My best guess
|> is that there's a tiny bit of numeric instability in the 9+ decimal
|> place so that on a given iteration a one variable choice at random
|> looks better than the other. Any other ideas?  Greg

I played around with this some time ago and noticed that it happens
only when there's perfect or very nearly perfect correlation.  I even
tried a third variable and it was ignored almost completely.  I
concluded it's highly unlikely to cause a problem since real data
wouldn't have perfectly correlated variables -- or if they did, they'd
be easy enough to detect.



|> 
|> ----- Original Message -----
|> From: Joshua Wiley <jwiley.psych at gmail.com>
|> To: Axel Urbiz <axel.urbiz at gmail.com>
|> Cc: R-help at r-project.org <R-help at r-project.org>; Ridgeway, Greg
|> Sent: Fri Feb 25 22:16:02 2011
|> Subject: Re: [R] Reproducibility issue in gbm (32 vs 64 bit)
|> 
|> Hi Axel,
|> 
|> I do not have a nice explanation why the results differ off the top of
|> my head.  I can say I can replicate what you get on 32/64 (both
|> Windows 7) bit with the development version of R and gbm_1.6-3.1.
|> 
|> Here is an even simpler example that shows the difference:
|> 
|> gbmfit <- gbm(1:50 ~ I(50:1) + I(60:11), distribution = "gaussian")
|> summary(gbmfit)
|> 
|> I copied that package maintainer.
|> 
|> Cheers,
|> 
|> Josh
|> 
|> On Fri, Feb 25, 2011 at 7:29 PM, Axel Urbiz <axel.urbiz at gmail.com> wrote:
|> > Dear List,
|> >
|> > The gbm package on Win 7 produces different results for the
|> > relative importance of input variables in R 32-bit relative to R 64-bit. Any
|> > idea why? Any idea which one is correct?
|> >
|> > Based on this example, it looks like the relative importance of 2 perfectly
|> > correlated predictors is "diluted" by half in 32-bit, whereas in 64-bit, one
|> > of these predictors gets all the importance and the other gets none. I found
|> > this interesting.
|> >
|> > ### Sample code
|> >
|> > library(gbm)
|> > set.seed(12345)
|> > xc=matrix(rnorm(100*20),100,20)
|> > y=sample(1:2,100,replace=TRUE)
|> > xc[,2] <- xc[,1]
|> > gbmfit <- gbm(y~xc[,1]+xc[,2] +xc[,3], distribution="gaussian")
|> > summary(gbmfit)
|> >
|> > ### Results on R 2.12.0 (32-bit)
|> >
|> >      var  rel.inf
|> > 1 xc[, 3] 49.76143
|> > 2 xc[, 1] 27.27432
|> > 3 xc[, 2] 22.96425
|> >>
|> > ### Results on R 2.12.0 (64-bit)
|> >> summary(gbmfit)
|> >      var  rel.inf
|> > 1 xc[, 1] 50.23857
|> > 2 xc[, 3] 49.76143
|> > 3 xc[, 2]  0.00000
|> >
|> > Thanks,
|> > Axel.
|> >
|> >        [[alternative HTML version deleted]]
|> >
|> > ______________________________________________
|> > R-help at r-project.org mailing list
|> > https://stat.ethz.ch/mailman/listinfo/r-help
|> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
|> > and provide commented, minimal, self-contained, reproducible code.
|> >
|> 
|> 
|> 
|> -- 
|> Joshua Wiley
|> Ph.D. Student, Health Psychology
|> University of California, Los Angeles
|> http://www.joshuawiley.com/
|> 
|> __________________________________________________________________________
|> 
|> This email message is for the sole use of the intended recipient(s) and
|> may contain confidential information. Any unauthorized review, use,
|> disclosure or distribution is prohibited. If you are not the intended
|> recipient, please contact the sender by reply email and destroy all copies
|> of the original message.
|> ______________________________________________
|> R-help at r-project.org mailing list
|> https://stat.ethz.ch/mailman/listinfo/r-help
|> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
|> and provide commented, minimal, self-contained, reproducible code.

-- 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.   
   ___    Patrick Connolly   
 {~._.~}                   Great minds discuss ideas    
 _( Y )_  	         Average minds discuss events 
(:_~*~_:)                  Small minds discuss people  
 (_)-(_)  	                      ..... Eleanor Roosevelt
	  
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.



More information about the R-help mailing list