[R] Help to improve prediction from supervised mapping using kohonen package

Ben Harrison harb at student.unimelb.edu.au
Wed Jul 24 11:05:26 CEST 2013


I would really like some or any advice on how I can improve (or fix??)
the following analysis. I hope I have provided a completely runnable
code - it doesn't produce any errors for me.

The resulting plot at the end shows a pretty poor correlation (just
speaking visually here) to the test set. How can I improve the
performance of the mapping and prediction?

Here are some of the data (continuous, numerical):

> head(somdata)
   MEAS_TC        SP        LN        SN       GR     NEUT
1 2.780000 59.181090  33.74364  19.75361 66.57665 257.0368
2 1.490000 49.047750 184.14598 139.07980 54.75052 326.8001
3 1.490000 49.128902 183.58853 138.02768 55.54114 327.4739
4 2.201276 18.240331  19.20386  10.74748 62.04492 494.4161
5 2.201276 18.215522  19.18009  10.72446 61.87448 494.7409
6 1.276476  9.337769  14.16061  19.06902 14.99612 363.0020

Complete data set is at the following link if you fancy it:
https://gist.github.com/ottadini/6068259

The first variable is the dependent. I wish to train a som using this
data, and then be able to predict MEAS_TC using a new set of data with
missing values of MEAS_TC. Below I'm simply splitting the somdata into
a training and a testing set for evaluation purposes.

# ===== #
library(kohonen)

somdata <- read.csv("somdata.csv")

# Create test and training sets from data:
inTrain <- sample(nrow(somdata), nrow(somdata)*(2/3))
training <- somdata[inTrain, ]
testing <- somdata[-inTrain, ]

# Supervised kohonen map, where the dependent variable is MEAS_TC.
# Attempting to follow the examples in Wehrens and Buydens, 2007,
21(5), J Stat Soft.
# somdata[1] is the MEAS_TC variable
somX <- scale(training[-1])
somY <- training[[1]]  # Needs to return a vector
# Train the map (not sure this is how it should be done):
tc.xyf <- xyf(data=somX, Y=somY, xweight=0.5, grid=somgrid(6, 6,
"hexagonal"), contin=TRUE)

# Prediction with test set:
tc.xyf.prediction <- predict(tc.xyf, newdata = scale(testing[-1]))

# Basic plot:
x <- seq(nrow(testing))
plot(x, testing[, "MEAS_TC"], type="l", col="black", ylim=c(0, 3.5))
par(new=TRUE)
plot(x, tc.xyf.prediction$prediction, type="l", col="red", ylim=c(0, 3.5))

# Wow, that's terrible. Do I have something wrong?
# ===== #



More information about the R-help mailing list