[R] Help to improve prediction from supervised mapping using kohonen package

Wed Jul 24 11:25:51 CEST 2013

Try rescaling your data prior to splitting it up into a training and test set. Otherwise you end up with two different ways of scaling.

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
Thierry.Onkelinx op inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

-----Oorspronkelijk bericht-----
Van: r-help-bounces op r-project.org [mailto:r-help-bounces op r-project.org] Namens Ben Harrison
Verzonden: woensdag 24 juli 2013 11:05
Aan: r-help op r-project.org
Onderwerp: [R] Help to improve prediction from supervised mapping using kohonen package

I would really like some or any advice on how I can improve (or fix??) the following analysis. I hope I have provided a completely runnable code - it doesn't produce any errors for me.

The resulting plot at the end shows a pretty poor correlation (just speaking visually here) to the test set. How can I improve the performance of the mapping and prediction?

Here are some of the data (continuous, numerical):

> head(somdata)
   MEAS_TC        SP        LN        SN       GR     NEUT
1 2.780000 59.181090  33.74364  19.75361 66.57665 257.0368
2 1.490000 49.047750 184.14598 139.07980 54.75052 326.8001
3 1.490000 49.128902 183.58853 138.02768 55.54114 327.4739
4 2.201276 18.240331  19.20386  10.74748 62.04492 494.4161
5 2.201276 18.215522  19.18009  10.72446 61.87448 494.7409
6 1.276476  9.337769  14.16061  19.06902 14.99612 363.0020

Complete data set is at the following link if you fancy it:
https://gist.github.com/ottadini/6068259

The first variable is the dependent. I wish to train a som using this data, and then be able to predict MEAS_TC using a new set of data with missing values of MEAS_TC. Below I'm simply splitting the somdata into a training and a testing set for evaluation purposes.

# ===== #
library(kohonen)

somdata <- read.csv("somdata.csv")

# Create test and training sets from data:
inTrain <- sample(nrow(somdata), nrow(somdata)*(2/3)) training <- somdata[inTrain, ] testing <- somdata[-inTrain, ]

# Supervised kohonen map, where the dependent variable is MEAS_TC.
# Attempting to follow the examples in Wehrens and Buydens, 2007, 21(5), J Stat Soft.
# somdata[1] is the MEAS_TC variable
somX <- scale(training[-1])
somY <- training[[1]]  # Needs to return a vector # Train the map (not sure this is how it should be done):
tc.xyf <- xyf(data=somX, Y=somY, xweight=0.5, grid=somgrid(6, 6, "hexagonal"), contin=TRUE)

# Prediction with test set:
tc.xyf.prediction <- predict(tc.xyf, newdata = scale(testing[-1]))

# Basic plot:
x <- seq(nrow(testing))
plot(x, testing[, "MEAS_TC"], type="l", col="black", ylim=c(0, 3.5))
par(new=TRUE)
plot(x, tc.xyf.prediction$prediction, type="l", col="red", ylim=c(0, 3.5))

# Wow, that's terrible. Do I have something wrong?
# ===== #

______________________________________________
R-help op r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
* * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * *
Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document.
The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.