I: [R] Problem

Thu May 23 16:26:19 CEST 2002

Now I explain my situation better.

I'm doing a study on the behaviour of Web pages. Given a query every Web
page can be relevant or not to the query. I have a collection of Web pages
and I know everything about the links between them. My purpose was to build
a group of clusters using some properties of the links (the number of links
betweeen pages, ...). For every cluster I can find two values: precision
that is (relevant pages in the cluster)/(number of pages in the cluster) and
recall that is (relevant pages in the cluster)/(relevant pages in all the
collection). I found these values for all the clusters. My purpose is to
discover if the relevant pages tend to be concentrated in some cluster. If
my theory would be Ok I should have clusters that have a lot of relevant
pages and clusters that have none relevant page.
If we consider the entire collection as a big cluster we have recall=1 and
the precision depend by the situation.
I want to find a test that tells me if there is a particolar difference in
the concentration of relevant pages between singular clusters and the
complements. Another way is to compare the precision of the clusters with
the global precision of the collection. My teacher told me that maybe a
student's t test for two patterns is ok for this work, but I don't
understand in which way. Is it possible to build a vector of the precision
of the clusters and a vector for the precision of the complement, make the
means for the two vectors and use the t test to compare the means? Does it
work?
Thank you for the time you spent for me.
Alessandro Ambrosini

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._