[R] Discriminant function analysis

Prof Brian Ripley ripley at stats.ox.ac.uk
Thu Feb 7 17:24:11 CET 2008


On Thu, 7 Feb 2008, Tyler Smith wrote:

> On 2008-02-07, Birgit Lemcke <birgit.lemcke at systbot.uzh.ch> wrote:
>>
>> Am 06.02.2008 um 21:00 schrieb Tyler Smith:
>>>
>>>> My dataset contains variables of the classes factor and numeric. Is
>>>> there another function that is able to handle this?
>>>
>>> The numeric variables are fine. The factor variables may have to be
>>> recoded into dummy binary variables, I'm not sure if lda() will deal
>>> with them properly otherwise.
>>
>> But aren´t binary variables also factors? Or is there another
>> variable class than factor or numeric?
>> Do I have have to set the classe of the binaries as numeric?
>>
>
> There is no binary class in R, so you would have to use a numeric
> field. For example:

Then what do you consider the logical type to be?

(Strictly it is not binary because of NAs, but it is used for binary 
variables in model formulae.)

>
> | sample | factor_1 |
> |--------+----------|
> | A      | red      |
> | B      | green    |
> | C      | blue     |
>
> becomes:
>
> | sample | dummy_1 | dummy_2 |
> |--------+---------+---------|
> | A      |       1 |       0 |
> | B      |       0 |       1 |
> | C      |       0 |       0 |
>
> R can deal with dummy_1 and dummy_2 as numeric vectors. The details
> should be explained in a good reference on multivariate statistics
> (I'm looking at Legendre and Legendre (1998) section 1.5.7 and 11.5).

The issue is rather a statistical one: the theory behind LDA assumes 
continuous variables, indeed a multivariate normal distribution.  You can 
apply LDA to binary explanatory variables, but there are much more 
appropriate methods (as indeed there are for factor explanatory 
variables).

> HTH,
>
> Tyler
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


More information about the R-help mailing list