[R] exploratory analysis of large categorical datasets

Kjetil Halvorsen kjetilbrinchmannhalvorsen at gmail.com
Sat Nov 13 17:38:36 CET 2010


you can also look at correspondence analysis, which is implemented
in multiple CRAN packages, for instance MASS, ade4 and others.
See the multivariate analysis task view on CRAN.

Kjetil

On Thu, Nov 11, 2010 at 10:39 PM, Dennis Murphy <djmuser at gmail.com> wrote:
> Hi:
>
> A good place to start would be package vcd and its suite of demos and
> vignettes, as well as the vcdExtra package, which adds a few more goodies
> and a very nice introductory vignette by Michael Friendly. You can't fault
> the package for a lack of documentation :)
>
> You might also find the following link useful:  http://www.datavis.ca/R/
> Scroll down to 'vcd and vcdExtra', and further down to 'tableplot', which
> was recently released on CRAN.
>
> HTH,
> Dennis
>
> On Thu, Nov 11, 2010 at 2:09 PM, Lara Poplarski <larapoplarski at gmail.com>wrote:
>
>> Dear List,
>>
>>
>> I am looking to perform exploratory analyses of two (relatively) large
>> datasets of categorical data. The first one is a binary 80x100 matrix, in
>> the form:
>>
>>
>> matrix(sample(c(0,1),25,replace=TRUE), nrow = 5, ncol=5, dimnames = list(c(
>> "group1", "group2","group3", "group4","group5"), c("V.1", "V.2", "V.3",
>> "V.4", "V.5")))
>>
>>
>> and the second one is a multistate 750x1500 matrix, with up to 15
>> *unordered* states per variable, in the form:
>>
>>
>> matrix(sample(c(1:15),25,replace=TRUE), nrow = 5, ncol=5, dimnames =
>> list(c(
>> "group1", "group2","group3", "group4","group5"), c("V.1", "V.2", "V.3",
>> "V.4", "V.5")))
>>
>>
>> Specifically, I am looking to see which pairs of variables are correlated.
>> For continuos data, I would use cor() and cov() to generate the correlation
>> matrix and the variance-covariance matrix, which I would then visualize
>> with
>> symnum() or image(). However, it is not clear to me whether this approach
>> is
>> suitable for categorical data of this kind.
>>
>>
>> Since I am new to R, I would greatly appreciate any input on how to
>> approach
>> this task and on efficient visualization of the results.
>>
>>
>> Many thanks in advance,
>>
>> Lara
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list