[R] LDA Precdict - Seems to be predicting on the Training Data

Gabriela Cendoya gcendoya at balcarce.inta.gov.ar
Tue Oct 20 17:32:01 CEST 2009


This is not an explanation but it gives you a solution,

Instead of using lda with a formula do it by giving the variables and the 
classification factor as arguments, base on your example and data:



outOfSample <- myDat[11:16,]

train <- myDat[1:10,]

outOfSample <- outOfSample[,3:5]

train2 <- train[,3:5]

fit <- lda(train2,train$c1)

forecast <- predict(fit,outOfSample)$class

length(forecast)

[1] 6



Seems that the problem arise when predict.lda works on lda fit applied to a 
formula class object.



Hope this help,

Gabriela.



______________________________
Lic. María Gabriela Cendoya
Magíster en Biometría
Profesor Adjunto
Cátedra de Estadística y Diseño
Facultad de Ciencias Agrarias
Universidad Nacional de Mar del Plata
______________________________

----- Original Message ----- 
From: "BostonR" <dpope at capitaliq.com>
To: <r-help at r-project.org>
Sent: Tuesday, October 20, 2009 11:31 AM
Subject: [R] LDA Precdict - Seems to be predicting on the Training Data


>
> When I import a simple dataset, run LDA, and then try to use the model to
> forecast out of sample data, I get a forecast for the training set not the
> out of sample set.  Others have posted this question, but I do not see the
> answers to their posts.
>
> Here is some sample data:
>
> Date Names v1 v2 v3 c1
> 1/31/2009 Name1 0.714472361 0.902552278 0.783353694 a
> 1/31/2009 Name2 0.512158919 0.770451596 0.111853346 a
> 1/31/2009 Name3 0.470693282 0.129200065 0.800973877 a
> 1/31/2009 Name4 0.24236898 0.472219638 0.486599763 b
> 1/31/2009 Name5 0.785619735 0.628511593 0.106868172 b
> 1/31/2009 Name6 0.718718387 0.697257275 0.690326648 b
> 1/31/2009 Name7 0.327331186 0.01715109 0.861421706 c
> 1/31/2009 Name8 0.632011743 0.599040196 0.320741634 c
> 1/31/2009 Name9 0.302804404 0.475166304 0.907143632 c
> 1/31/2009 Name10 0.545284813 0.967196462 0.945163717 a
> 1/31/2009 Name11 0.563720418 0.024862018 0.970685281 a
> 1/31/2009 Name12 0.357614427 0.417490445 0.415162276 a
> 1/31/2009 Name13 0.154971203 0.425227967 0.856866993 b
> 1/31/2009 Name14 0.935080173 0.488659307 0.194967973 a
> 1/31/2009 Name15 0.363069339 0.334206603 0.639795596 b
> 1/31/2009 Name16 0.862889297 0.821752532 0.549552875 a
>
> Attached is the code:
>
> myDat <-read.csv(file="f:\\Systematiq\\data\\TestData.csv",
> header=TRUE,sep=",")
> myData <- data.frame(myDat)
>
> length(myDat[,1])
>
> train <- myDat[1:10,]
> outOfSample <- myDat[11:16,]
> outOfSample <- (cbind(outOfSample$v1,outOfSample$v2,outOfSample$v3))
> outOfSample <-data.frame(outOfSample)
>
> length(train[,1])
> length(outOfSample[,1])
>
> fit <- lda(train$c1~train$v1+train$v2+train$v3)
>
> forecast <- predict(fit,outOfSample)$class
>
> length(forecast)##### I am expecting this to be same as
> lengthoutOfSample[,1]), which is 6
>
> Output:
>
> length(forecast)##### I am expecting this to be same as
> lengthoutOfSample[,1]), which is 6
> [1] 10
>
>
>
>
>
>
> -- 
> View this message in context: 
> http://www.nabble.com/LDA-Precdict---Seems-to-be-predicting-on-the-Training-Data-tp25976178p25976178.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

___________________________________________________________________________

Aviso:
=====

El contenido del presente e-mail y sus posibles adjuntos pertenecen al INTA y pueden contener información confidencial. Si usted no es el destinatario original de este mensaje y por este medio pudo acceder a dicha información, por favor solicitamos contactar al remitente y eliminar el mensaje de inmediato. Se encuentra prohibida la divulgación, copia, distribución o cualquier otro uso de la información contenida en el presente e-mail por parte de personas distintas al destinatario. 


This e-mail contents and its possible attachments belong to INTA and may contain confidential information. If this message was not originally addressed to you, but you have accessed to such information by this means, please contact the sender and eliminate this message immediately. Circulation, copy, distribution, or any other use of the information contained in this e-mail is not allowed on part of those different from the addressee.


Antes de imprimir este mensaje, asegúrese de que sea necesario. Proteger el medio ambiente está también en su mano.




More information about the R-help mailing list