[R] help with lda function

David Winsemius dwinsemius at comcast.net
Tue Sep 29 00:41:50 CEST 2009


Your results are the same (after scaling and sign reversal) out to the  
4th decimal place as those from lda (which by the way is almost  
certainly from the MASS package and not from an impossible to find  
"lda package".)

 > read.table(textConnection(txt))
         V1
1 164.4283
2 166.2492
3 170.5232
4 156.5622
5 127.7540
6 136.7704
7 136.3436
 > est <-read.table(textConnection(txt))
 > scale(est)
              V1
[1,]  0.7656185
[2,]  0.8712707
[3,]  1.1192567
[4,]  0.3092117
[5,] -1.3622976
[6,] -0.8391481
[7,] -0.8639119
attr(,"scaled:center")
      V1
151.233
attr(,"scaled:scale")
       V1
17.23484

 > LD1est <- read.table(textConnection(" LD1
+ 1 -2.3769280
+ 2 -2.7049437
+ 3 -3.4748309
+ 4 -0.9599825
+ 5  4.2293774
+ 6  2.6052193
+ 7  2.6820884"), header=T)


 > scale(LD1est)
          LD1
1 -0.7656170
2 -0.8712721
3 -1.1192555
4 -0.3092138
5  1.3622976
6  0.8391505
7  0.8639103
attr(,"scaled:center")
           LD1
-3.172066e-17
attr(,"scaled:scale")
      LD1
3.104591

On Sep 28, 2009, at 5:54 PM, Pete Shepard wrote:

> I am having a problem understanding the lda package. I have a  
> dataset here:
>
>    [,1] [,2] [,3]
> [1,] 2.95 6.63    0
> [2,] 2.53 7.79    0
> [3,] 3.57 5.65    0
> [4,] 3.16 5.47    0
> [5,] 2.58 4.46    1
> [6,] 2.16 6.22    1
> [7,] 3.27 3.52    1
>
> If I do the following;
>
> "names(d)<-c("y","x1","x2")
> d$x1 = d$x1 * 100
> d$x2 = d$x2 * 100
> g<-lda( y ~ x1 + x2, data=d)
> v2 <- predict(g, d)",
>
> I get;
>        LD1
> 1 -2.3769280
> 2 -2.7049437
> 3 -3.4748309
> 4 -0.9599825
> 5  4.2293774
> 6  2.6052193
> 7  2.6820884
>
> However, If I do it manually,
>
> "rawdata<-matrix(scan("tab1_1.
>>
>> dat"),ncol=3,byrow=T)
>> group <- rawdata[,1]
>> X <- 100 * rawdata[,2:3]
>> Apf <- X[group==1,]
>> Af <- X[group==0,]
>> xbar1 <- apply(Af, 2, mean)
>> S1 <- var(Af)
>> N1 <- dim(Af)[1]
>> xbar2 <- apply(Apf, 2, mean)
>> S2 <- var(Apf)
>> N2 <- dim(Apf)[1]
>> S<-((N1-1)*S1+(N2-1)*S2)/(N1+N2-2)
>> Sinv=solve(S)
>> d<-xbar1-xbar2
>> b <- Sinv %*% d
>> v <- X %*% b",
>>
>> I get;
>>
>>        [,1]
>> [1,] 164.4283
>> [2,] 166.2492
>> [3,] 170.5232
>> [4,] 156.5622
>> [5,] 127.7540
>> [6,] 136.7704
>> [7,] 136.3436
>>
>
>
>
>
>
>
>>
>> I am having a problem understanding the lda package. I have a  
>> dataset here:
>>
>>    [,1] [,2] [,3]
>> [1,] 2.95 6.63    0
>> [2,] 2.53 7.79    0
>> [3,] 3.57 5.65    0
>> [4,] 3.16 5.47    0
>> [5,] 2.58 4.46    1
>> [6,] 2.16 6.22    1
>> [7,] 3.27 3.52    1
>>
>> If I do the following;
>>
>> "names(d)<-c("y","x1","x2")
>> d$x1 = d$x1 * 100
>> d$x2 = d$x2 * 100
>> g<-lda( y ~ x1 + x2, data=d)
>> v2 <- predict(g, d)",
>>
>> I get;
>>        LD1
>> 1 -2.3769280
>> 2 -2.7049437
>> 3 -3.4748309
>> 4 -0.9599825
>> 5  4.2293774
>> 6  2.6052193
>> 7  2.6820884
>>
>> However, If I do it manually,
>>
>> "rawdata<-matrix(scan("tab1_1.dat"),ncol=3,byrow=T)
>> group <- rawdata[,1]
>> X <- 100 * rawdata[,2:3]
>> Apf <- X[group==1,]
>> Af <- X[group==0,]
>> xbar1 <- apply(Af, 2, mean)
>> S1 <- var(Af)
>> N1 <- dim(Af)[1]
>> xbar2 <- apply(Apf, 2, mean)
>> S2 <- var(Apf)
>> N2 <- dim(Apf)[1]
>> S<-((N1-1)*S1+(N2-1)*S2)/(N1+N2-2)
>> Sinv=solve(S)
>> d<-xbar1-xbar2
>> b <- Sinv %*% d
>> v <- X %*% b",
>>
>> I get;
>>
>>        [,1]
>> [1,] 164.4283
>> [2,] 166.2492
>> [3,] 170.5232
>> [4,] 156.5622
>> [5,] 127.7540
>> [6,] 136.7704
>> [7,] 136.3436
>>
>>
>> It seems there is an extra step that I am missing? The predict step  
>> that
>> adds a constant to the second set of values? Can anyone clear this  
>> up for
>> me?
>



David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list