[R] LDA Precdict - Seems to be predicting on the Training Data

Tony Plate tplate at acm.org
Tue Oct 20 17:23:13 CEST 2009


Maybe you're getting strange results because you're not supplying a data object to lda() when you build your fit.

When I do it the "standard" way, predict.lda() uses the new data and produces a result of length 6 as expected:

> myDat <- read.csv("clipboard", sep="\t")
> fit <- lda(c1 ~ v1 + v2 + v3, data=myDat[1:10,])
> predict(fit, myDat[11:16,])
$class
[1] c c c b c a
Levels: a b c
...
> 

-- Tony Plate


BostonR wrote:
> When I import a simple dataset, run LDA, and then try to use the model to
> forecast out of sample data, I get a forecast for the training set not the
> out of sample set.  Others have posted this question, but I do not see the
> answers to their posts.
> 
> Here is some sample data:
> 
> Date	Names	v1	v2	v3	c1
> 1/31/2009	Name1	0.714472361	0.902552278	0.783353694	a
> 1/31/2009	Name2	0.512158919	0.770451596	0.111853346	a
> 1/31/2009	Name3	0.470693282	0.129200065	0.800973877	a
> 1/31/2009	Name4	0.24236898	0.472219638	0.486599763	b
> 1/31/2009	Name5	0.785619735	0.628511593	0.106868172	b
> 1/31/2009	Name6	0.718718387	0.697257275	0.690326648	b
> 1/31/2009	Name7	0.327331186	0.01715109	0.861421706	c
> 1/31/2009	Name8	0.632011743	0.599040196	0.320741634	c
> 1/31/2009	Name9	0.302804404	0.475166304	0.907143632	c
> 1/31/2009	Name10	0.545284813	0.967196462	0.945163717	a
> 1/31/2009	Name11	0.563720418	0.024862018	0.970685281	a
> 1/31/2009	Name12	0.357614427	0.417490445	0.415162276	a
> 1/31/2009	Name13	0.154971203	0.425227967	0.856866993	b
> 1/31/2009	Name14	0.935080173	0.488659307	0.194967973	a
> 1/31/2009	Name15	0.363069339	0.334206603	0.639795596	b
> 1/31/2009	Name16	0.862889297	0.821752532	0.549552875	a
> 
> Attached is the code:
> 
> myDat <-read.csv(file="f:\\Systematiq\\data\\TestData.csv",
> header=TRUE,sep=",")
> myData <- data.frame(myDat)
> 
> length(myDat[,1])
> 
> train <- myDat[1:10,]
> outOfSample <- myDat[11:16,]
> outOfSample <- (cbind(outOfSample$v1,outOfSample$v2,outOfSample$v3))
> outOfSample <-data.frame(outOfSample)
> 
> length(train[,1])
> length(outOfSample[,1])
> 
> fit <- lda(train$c1~train$v1+train$v2+train$v3)
> 
> forecast <- predict(fit,outOfSample)$class
> 
> length(forecast)##### I am expecting this to be same as
> lengthoutOfSample[,1]), which is 6
> 
> Output:
> 
> length(forecast)##### I am expecting this to be same as
> lengthoutOfSample[,1]), which is 6
> [1] 10
> 
> 
> 
> 
> 
>




More information about the R-help mailing list