[R] Problem while predicting in regression trees

Muhammad Bilal Muhammad2.Bilal at live.uwe.ac.uk
Mon May 9 01:14:27 CEST 2016


Hi All,

I have the following script, that raises error at the last command. I am new to R and require some clarification on what is going wrong.

#Creating the training and testing data sets
splitFlag <- sample.split(pfi_v3, SplitRatio = 0.7)
trainPFI <- subset(pfi_v3, splitFlag==TRUE)
testPFI <- subset(pfi_v3, splitFlag==FALSE)


#Structure of the trainPFI data frame
> str(trainPFI)
*******
'data.frame': 491 obs. of  16 variables:
 $ project_id             : int  1 2 3 6 7 9 10 12 13 14 ...
 $ project_lat            : num  51.4 51.5 52.2 51.9 52.5 ...
 $ project_lon            : num  -0.642 -1.85 0.08 -0.401 -1.888 ...
 $ sector                 : Factor w/ 9 levels "Defense","Hospitals",..: 4 4 4 6 6 6 6 6 6 6 ...
 $ contract_type          : chr  "Turnkey" "Turnkey" "Turnkey" "Turnkey" ...
 $ project_duration       : int  1826 3652 121 730 730 790 522 819 998 372 ...
 $ project_delay          : int  -323 0 -60 0 0 0 -91 0 0 7 ...
 $ capital_value          : num  6.7 5.8 21.8 24.2 40.7 10.7 70 24.5 60.5 78 ...
 $ project_delay_pct      : num  -17.7 0 -49.6 0 0 0 -17.4 0 0 1.9 ...
 $ delay_type             : Ord.factor w/ 9 levels "7 months early & beyond"<..: 1 5 3 5 5 5 2 5 5 6 ...

library(caret)
library(e1071)

set.seed(100)

tr.control <- trainControl(method="cv", number=10)
cp.grid <- expand.grid(.cp = (0:10)*0.001)

#Fitting the model using regression tree
tr_m <- train(project_delay ~ project_lon + project_lat + project_duration + sector + contract_type + capital_value, data = trainPFI, method="rpart", trControl=tr.control, tuneGrid = cp.grid)

tr_m

CART
491 samples
15 predictor
No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 443, 442, 441, 442, 441, 442, ...
Resampling results across tuning parameters:
  cp     RMSE      Rsquared
  0.000  441.1524  0.5417064
  0.001  439.6319  0.5451104
  0.002  437.4039  0.5487203
  0.003  432.3675  0.5566661
  0.004  434.2138  0.5519964
  0.005  431.6635  0.5577771
  0.006  436.6163  0.5474135
  0.007  440.5473  0.5407240
  0.008  441.0876  0.5399614
  0.009  441.5715  0.5401718
  0.010  441.1401  0.5407121
RMSE was used to select the optimal model using  the smallest value.
The final value used for the model was cp = 0.005.

#Fetching the best tree
best_tree <- tr_m$finalModel

Alright, all the aforementioned commands worked fine.

Except the subsequent command raises error, when the developed model is used to make predictions:
best_tree_pred <- predict(best_tree, newdata = testPFI)
Error in eval(expr, envir, enclos) : object 'sectorHospitals' not found

Can someone guide me what to do to resolve this issue.

Any help will be highly appreciated.

Many Thanks and

Kind Regards

--
Muhammad Bilal
Research Fellow and Doctoral Researcher,
Bristol Enterprise, Research, and Innovation Centre (BERIC),
University of the West of England (UWE),
Frenchay Campus,
Bristol,
BS16 1QY

muhammad2.bilal at live.uwe.ac.uk<mailto:olugbenga2.akinade at live.uwe.ac.uk>


	[[alternative HTML version deleted]]



More information about the R-help mailing list