[R] project test data into principal components of training dataset

olsen o.o.wolf at qmul.ac.uk
Wed Apr 20 19:33:54 CEST 2016


For the records, a slightly hacky answer, by modifying the ggbiplot
function, is provided now here:
http://stackoverflow.com/questions/36603268/how-to-plot-training-and-test-validation-data-in-r-using-ggbiplot

On 18/04/16 17:20, olsen wrote:
> Hi there,
> 
> I've a training dataset and a test dataset. My aim is to visually
> allocate the test data within the calibrated space reassembled by the
> PC's of the training data set, furthermore to keep the training data set
> coordinates fixed, so they can serve as ruler for measurement for
> additional test datasets coming up.
> 
> Please find a minimum working example using the wine dataset below.
> Ideally I would like to use ggbiplot as it comes with the elegant
> features but it only accepts objects of class prcomp, princomp, PCA, or
> lda, which is not fullfilled by the predicted test data.
> 
> I'm still slightly wet behind my R ears and the only solution I can
> think of is to plot the calibrated space in ggbiplot and the training
> data in ggplot and then join them, in the worst case by exporting them
> as svg and importing them in inkscape. Which is slightly complicated
> plus the scaling is different.
> 
> Any indication how this mission can be accomplished very welcome!
> 
> Thanks and greets
> Olsen
> 
> I started a threat on stackoverflow on that issue but know relevant
> indications so far.
> http://stackoverflow.com/questions/36603268/how-to-plot-training-and-test-validation-data-in-r-using-ggbiplot
> 
> ##MWE
> library(ggbiplot)
> data(wine)
> 
> ##pca on the wine dataset used as training data
> wine.pca <- prcomp(wine, center = TRUE, scale. = TRUE)
> 
> wine$class <- wine.class
> 
> ##simulate test data by generating three new wine classes
> wine.new.1 <- wine[c(sample(1:nrow(wine), 25)),]
> wine.new.2 <- wine[c(sample(1:nrow(wine), 43)),]
> wine.new.3 <- wine[c(sample(1:nrow(wine), 36)),]
> 
> ##Predict PCs for the new classes by transforming
> #them using the predict.prcomp function
> pred.new.1 <- predict(wine.pca, newdata = wine.new.1)
> pred.new.2 <- predict(wine.pca, newdata = wine.new.2)
> pred.new.3 <- predict(wine.pca, newdata = wine.new.3)
> 
> #simulate the classes for the new sorts
> wine.new.1$class <- rep("new.wine.1", nrow(wine.new.1))
> wine.new.2$class <- rep("new.wine.2", nrow(wine.new.2))
> wine.new.3$class <- rep("new.wine.3", nrow(wine.new.3))
> wine.new.bind <- rbind(wine.new.1, wine.new.2, wine.new.3)
> 
> ##compose the plot by joining the PCA ggbiplot training data with the
> testing data from ggplot
> #plot the calibrated space resulting from the test data
> g.train <- ggbiplot(wine.pca, obs.scale = 1, var.scale = 1, groups =
> wine$class, ellipse = TRUE, circle = TRUE)
> g.train
> #plot the test data resulting from the prediction
> df.pred = data.frame(PC1 = wine.new.bind[,1], PC2 = wine.new.bind[,2],
>                     PC3 = wine.new.bind[,3], PC4 = wine.new.bind[,4],
>                     classes = wine.new.bind$class)
> g.test <- ggplot(df.pred, aes(PC1, PC2, color = classes, shape =
> classes)) +  geom_point() +  stat_ellipse()
> g.test
> 
> 
> 
> 
> 

-- 
Our solar system is the cream of the crop
http://hasa-labs.org



More information about the R-help mailing list