[R] Information_content_test

francois fauteux ffauteux at gmail.com
Mon Oct 9 21:59:53 CEST 2006


Hi;

I have a matrix of 154 elements by 66241 sub-elements. The elements
are chains of characters, sub elements are simply sub-chains of a
certain length. For each element, I computed a count of the ocurrence
of sub-elements (scan of strings). I thus have a matrix of numerical
values (between 0 and max number of occurences).

One the other hand, I computed distances and hierarchical clustering
of all elements by another information-content based methodology. I
would like to test, for a cluster of elements (for ex. elements 1 to
10, versus 11 to 154) the significance of occurence of the counts for
each sub-element (66241).

I could test them one by one like this:

sub1<-c(0,2,0,6,3,2,5,4,3,...
sub1_C<-c(sub1[1],sub1[2],sub1[3],...
sub1_O<-c(sub1[11],sub1[12],sub1[13],...
t.test(sub1_C, sub1_O,
       alternative = c("greater"),
       mu = 0, paired = FALSE, var.equal = FALSE,
       conf.level = 0.95)

QUESTION 1: how could this be BATCH done for all elements - loading
data in a table, matrix or data.frame (testing the significance of
count means of cluster(1-10) versus cluster(11-154)... Elements
(clusters) to test are not ordered (for ex. elements 1,15,4,7,9,11,12
against 2,150,40,...)

Does anyone think of better statistics to be used in such a context
[STRING CONTENT ANALYSIS]? I thought of using Bayesian type analyses,
but don't know how.

Thank for hints, regards.

François

PS.  Pls provide "for newbies" details.



More information about the R-help mailing list