[R] data mining for R

Philippe Grosjean phgrosje at ulb.ac.be
Thu Sep 5 16:16:37 CEST 2002


In the risk to be heavily critisized, one could mainly see data mining as a
pseudo-new concept invented to sell new (and sometimes, expensive) software
to industries. Data mining is nothing else than existing statistical
analyses optimized for speed in order to deal with millions of entries, or
even more, in a reasonable period of time. So, as it was suggested earlier
in this thread, methods probably exist already somewhere in R. On the
counterpart, R could not be optimized enough to deal with the huge dataset
usually manipulated by data mining software.
Best,

Philippe Grosjean

-----Message d'origine-----
De : owner-r-help at stat.math.ethz.ch
[mailto:owner-r-help at stat.math.ethz.ch]De la part de Peter Dalgaard BSA
Envoye : jeudi 5 septembre 2002 14:37
A : Prof Brian Ripley
Cc : Pgoodr1 at aol.com; r-help at stat.math.ethz.ch
Objet : Re: [R] data mining for R


Prof Brian Ripley <ripley at stats.ox.ac.uk> writes:

> Well, R does not have a `statistics' plug in either!
>
> In the words of Witten & Franke's book, Data Mining is `statistics plus
> marketing', and R can do a lot of data mining.
>
> If you could be more specififc about what techniques you want to use, we
> may be able to help you further.
>
> On Thu, 5 Sep 2002 Pgoodr1 at aol.com wrote:
>
> > I was wondering if R had a data mining componant and how i could get it.
If not do you know anyone who is developing a datamining "plug in" for R
> > Phillip Goodreid

Another possible definition is "statistics with massive amounts of
incidental data". A large part of the DM practices seems to be
"quarrying". The actual statistical methodology is only a part of a
complicated process of getting data out of databases on a, say, weekly
schedule, roughly preprocessed, then fed to a statistics engine, and
postprocessed to something that can end up on the manager's desk.

In my impression that is essentially what SPSS's Clementine product
does, using a GUI to draw arrows between pretty little hexagonal
cells. It is not at all unthinkable that something like that could be
coded up in R too. I think we have most of the pieces to do it.

--
   O__  ---- Peter Dalgaard             Blegdamsvej 3
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._



-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list