[R] randomForest warning: The response has five or fewer unique values. Are you sure you want to do regression?

Sean Porter sporter at ori.org.za
Tue Mar 25 08:34:17 CET 2014


Dear Andy,

Thank you for your help! Below are the full details of what I am doing in R
along with the data structure, so hopefully this will help. Okay so the
warning is just a warning and nothing to worry about when doing regression.
But why is randomForest only producing regression trees for each of only 3
species when I have 100 species in the matrix, surely this is not correct,
what am I doing wrong? Also, what did you mean when you said by using the
code I am not using randomForest directly ? 

Many thanks, Sean 

> # For Andy
> # get biological data into R
>  biological <- read.table (file = "C:/bio1.txt", header = TRUE) 
> dim(biological)
[1]  14 100
> # get environmental data into R
> enviro <- read.table (file = "C:/abio1.txt", header = TRUE)
> dim(enviro)
[1] 14  8
> # data structure of biological data
> str(biological)
'data.frame':   14 obs. of  100 variables:
 $ a : num  0 0 0 0 0 0 0 0 0 0 ...
 $ b : num  0 0 0 0 257 ...
 $ c : int  0 0 0 0 0 0 441 0 0 0 ...
 $ d : num  179 0 1430 0 0 ...
 $ e : num  100 0 601 0 123 ...
 $ f : num  0 0 3 0 1.5 0 0 0 0 4.5 ...
 $ g : num  0 0 0 0 0 0 0 0 0 0 ...
 $ h : int  0 0 0 0 0 0 0 0 1 0 ...
 $ i : num  0 0 0 0 0 0 0 0 0 3.85 ...
 $ j : num  0 0 0 27.6 3.6 ...
 $ k : num  0 0 0 0 0 0 0 0 0 1.8 ...
 $ l : num  0 0 0 0 0 0 0 0 0 0 ...
 $ m : num  0 0 0 0 0 0 0 0 0 0 ...
 $ n : num  0 0 0 0 0 0 0 1.1 0 0 ...
 $ o : num  0 0 0 0 0 0.2 0 0 0 0 ...
 $ p : num  0 0.15 0 0 0.35 0.9 0 0 0 0 ...
 $ q : num  0 0 0 0 0 0 0 9.4 0 0 ...
 $ r : num  0 41 0 0 1.75 0 0 0 0 0 ...
 $ s : num  0 0 0 0 0 ...
 $ t : num  0 0 22.1 0 0 ...
 $ u : num  0 0 0 0 0 0 0 0 0 0 ...
 $ v : num  0 0 0 0 0.12 0 0 0 0 0 ...
 $ w : num  0 0 0 0 4.95 6.6 0 3.3 0 3.3 ...
 $ x : num  0 0 0 0 7.9 ...
 $ y : int  0 0 0 0 0 1 0 0 0 0 ...
 $ z : num  0 0 0 0 0 0 0 0 0 0.8 ...
 $ aa: num  0 0 0 0 0 0 0 0 0 0 ...
 $ ab: num  0 47 0 136.3 9.4 ...
 $ ac: num  0 0 0 0 0 0 0 0 0 0 ...
 $ ad: num  0 4.2 0 8.4 0 0 0 0.7 0 0 ...
 $ ae: int  0 0 0 2 0 1 0 0 0 0 ...
 $ af: num  0 92.4 720.7 0 554.4 ...
 $ ag: int  0 0 0 0 0 0 0 0 0 0 ...
 $ ah: int  0 0 0 0 0 0 0 0 0 0 ...
 $ ai: num  43.4 3.4 26.4 0 1.7 ...
 $ aj: num  0 0 0 0 0 ...
 $ ak: num  0 0 0.25 0 0 0 0 0 0 0 ...
 $ al: num  0 0 0 0 0 ...
 $ am: num  561.6 0 93.6 0 374.4 ...
 $ an: num  234 0 562 0 187 ...
 $ ao: num  15.92 2.16 0 0 1.08 ...
 $ ap: num  31.84 0 1.08 0 3.24 ...
 $ aq: num  0 0 0 37.8 29.4 0 92.4 0 0 0 ...
 $ ar: int  0 72 0 76 16 49 0 8 0 0 ...
 $ as: num  0 0 0 0 0 0 0 0 0 0 ...
 $ at: num  0 0 0 0 0 0 0 0 0 0 ...
 $ au: num  0 0 0 0 0 0 0 0 0 0 ...
 $ av: num  0 31.8 0 25.4 0 ...
 $ aw: num  0 0 0 0 0 0 0 0 0 0 ...
 $ ax: num  0 2.7 0 0 0 0 0 2.7 2.7 0 ...
 $ ay: int  0 0 0 0 0 1 0 0 0 0 ...
 $ az: num  2.7 0 0 0 0 0 0 0 0 0 ...
 $ ba: num  7.72 0 0 0 0 0 0 0 0 0 ...
 $ bb: num  262 0 0 0 0 ...
 $ bc: num  0 1.6 0 13.6 0 ...
 $ bd: num  0 0 7.96 0 0 0 0 0 0 0 ...
 $ be: num  2493 0 1254 0 988 ...
 $ bf: num  0 46.4 0 72.5 45 ...
 $ bg: num  218 0 265 0 884 ...
 $ bh: num  0 0 0 0 0 0 0 2.8 0 0 ...
 $ bi: num  0 0 0 0 0 ...
 $ bj: num  0 0 0 0 0 0 0 0 0 0 ...
 $ bk: num  0 0 0 0 0 1.4 0 0 0 0 ...
 $ bl: num  0 0 0 0 0 0 3.2 0 0 0 ...
 $ bm: num  0 2.6 0 72.8 0 ...
 $ bn: num  0 0 82.8 0 0 0 0 0 0 0 ...
 $ bo: num  0 0 0 0 0 ...
 $ bp: int  0 0 0 0 0 0 0 288 0 0 ...
 $ bq: num  28.4 530.5 433.4 473.9 615.6 ...
 $ br: num  0 0 0 0 0 0 0 0 0 0 ...
 $ bs: num  0 0 0 0 0 0 0 0 0 14.5 ...
 $ bt: num  56.2 0 1125 0 78.8 ...
 $ bu: num  205.4 7.9 130.3 0 0 ...
 $ bv: num  1353.2 0 119.4 0 79.6 ...
 $ bw: num  0 0 0 2.45 0.7 2.1 0 0 0 0 ...
 $ bx: num  0 0 0 0 0 ...
 $ by: num  0 0 0 0 0 0 26.4 0 0 0 ...
 $ bz: num  208 1806 3727 208 8427 ...
 $ ca: num  49.2 0 32.8 0 57.4 ...
 $ cb: num  0 7.15 0 0 0 0 1.65 0 0 0 ...
 $ cc: num  0 590 0 419 0 ...
 $ cd: num  0 0 0 0 0 0 0 0 1.5 0 ...
 $ ce: num  1390 0 1394 0 552 ...
 $ cf: num  75.6 0 0 0 0 ...
 $ cg: num  3.86 0 0 0 0 0 0 0 0 0 ...
 $ ch: num  81.3 0 0 0 0 ...
 $ ci: num  0 0 0 0 12.2 ...
 $ cj: num  0 1.2 0 0.8 0 0.8 0.8 3.6 0 0 ...
 $ ck: num  0 0 0 0 0 17.4 0 0 0 0 ...
 $ cl: int  0 0 0 0 0 0 0 0 0 435 ...
 $ cm: num  0 0 0 0 0 0 31.2 0 0 0 ...
 $ cn: num  0 0 0 16.8 0 0 0 0 0 0 ...
 $ co: num  11.61 0 2.11 0 10.55 ...
 $ cp: num  15.05 1.4 0.35 0 0 ...
 $ cq: num  0 0 0 0 0 0 0 4.2 0 0 ...
 $ cr: int  0 0 0 0 1 0 0 0 0 0 ...
 $ cs: num  0 0 0 0 0 0 17.1 0 0 0 ...
 $ ct: num  2.7 0 0 0 0 0 0 0 0 0 ...
 $ cu: num  0 0 30.9 0 41.2 ...
  [list output truncated]
> # data structure of environmental data
> str(enviro)
'data.frame':   14 obs. of  8 variables:
 $ Temperature         : num  24.8 24.4 24.3 23 24.6 24.6 24.8 24.9 24.3
24.5 ...
 $ Turbidity           : num  0.047 0.046 0.052 0.058 0.049 0.047 0.047
0.049 0.049 0.051 ...
 $ Chlorophyll         : num  0.24 0.23 0.29 0.26 0.25 0.23 0.23 0.28 0.3
0.29 ...
 $ Waveheight          : num  2.14 2.13 2.12 2.12 2.12 2.12 2.11 2.12 2.11
2.12 ...
 $ nLw551              : num  0.231 0.228 0.228 0.236 0.226 ...
 $ nLw667              : num  1e-04 8e-04 1e-03 1e-03 1e-03 1e-04 1e-04
1e-03 1e-03 1e-04 ...
 $ Sediment.nlw551.667.: num  0.231 0.229 0.229 0.237 0.227 ...
 $ Depth               : num  4.8 4.1 5 4 6.2 7.7 10.1 4.3 5.1 7.9 ...
> # conduct randomForest regression
> gf <- gradientForest(cbind(enviro, biological), predictor.vars =
colnames(enviro), response.vars = colnames(biological), ntree = 500,
transform = NULL, compact = T, nbin = 201, maxLevel = 5, corr.threshold =
0.5)
There were 50 or more warnings (use warnings() to see the first 50)
> gf
A forest of 500 regression trees for each of 3 species

Call:

gradientForest(data = cbind(enviro, biological), predictor.vars =
colnames(enviro), 
    response.vars = colnames(biological), ntree = 500, transform = NULL, 
    maxLevel = 5, corr.threshold = 0.5, compact = T, nbin = 201)



Important variables:
[1] Sediment.nlw551.667. Depth                nLw551               nLw667
Chlorophyll         

> # End


         


-----Original Message-----
From: Liaw, Andy [mailto:andy_liaw at merck.com] 
Sent: 25 March 2014 02:37 AM
To: Sean Porter; r-help at r-project.org
Subject: RE: [R] randomForest warning: The response has five or fewer unique
values. Are you sure you want to do regression?

If you are using the code, that's not really using randomForest directly.  I
don't understand the data structure you have (since you did not show
anything) so can't really tell you much.  In any case, that warning came
from randomForest() when it is run in regression mode but the response has
fewer than five distinct values.  It may be legitimate regression data, and
if so you can safely ignore the warning (that's why it's not an error).
It's there to catch the cases when people try to do classification with
class labels 1, 2, ..., k and forgot to make it a factor.

Best,
Andy Liaw

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Sean Porter
Sent: Thursday, March 20, 2014 3:27 AM
To: r-help at r-project.org
Subject: [R] randomForest warning: The response has five or fewer unique
values. Are you sure you want to do regression?

Hello everyone,

 

Im relatively new to R and new to the randomForest package and have scoured
the archives for help with no luck. I am trying to perform a regression on a
set of predictors and response variables to determine the most important
predictors. I have 100 response variables collected from 14 sites and 8
predictor variables from the same 14 sites. I run the code to perform the
randomForest  regression given by Pitcher et al 2011   (
http://gradientforest.r-forge.r-project.org/biodiversity-survey.pdf ). 

 

However, after running the code I get the warning:

 

" In randomForest.default(m, y, ...) :

  The response has five or fewer unique values.  Are you sure you want to do
regression?"

 

And it produces a set of 500 regression trees for each of 3 species only
when the number of species in the response file is 100. I noticed that in
the example by Pitcher they get 500 trees from only 90 species even though
they input 110 species in the response data.

 

Why am I getting the warning/how do I solve it, and why is randomForest
producing trees for only 3 species when I am looking at 100 species
(response variables)?

 

Many thanks

 

Sean

 


	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Notice:  This e-mail message, together with any attachme...{{dropped:15}}




More information about the R-help mailing list