[R] Statistical analysis of a large database

Vito Ricci vito_ricci at yahoo.com
Tue Oct 12 10:11:48 CEST 2004


Hi,

for your analysis use the package:

ROracle	Oracle database interface for R

http://microarrays.unife.it/CRAN/src/contrib/Descriptions/ROracle.html

see also:

Diego Kuonen, Introduction au data mining avec R :
vers la reconquête du `knowledge discovery in
databases' par les statisticiens. Bulletin of the
Swiss Statistical Society, 40:3-7, 2001.
http://www.statoo.com/en/publications/2001.R.SSS.40/

Diego Kuonen and Reinhard Furrer, Data mining avec R
dans un monde libre. Flash Informatique Spécial Été,
pages 45-50, sep 2001.
http://sawww.epfl.ch/SIC/SA/publications/FI01/fi-sp-1/sp-1-page45.html


R Development Core Team, R Data Import/Export,
versione 1.9.0, aprile 2004, pagg. 11-18
http://cran.r-project.org/doc/manuals/R-data.pdf

Brian D. Ripley, Datamining: Large Databases and
Methods, in Proceedings  of "useR! 2004 - The R User
Conference", maggio 2004
http://www.ci.tuwien.ac.at/Conferences/useR-2004/Keynotes/Ripley.pdf

Brian D. Ripley, Using Databases with R, R News,
Gennaio 2001, pagg. 18-20
http://cran.r-project.org/doc/Rnews/Rnews_2001-1.pdf

B. D. Ripley, R. M. Ripley,  Applications of R Clients
and Servers in Proceedings of the Distributed
Statistical Computing 2001 Workshop, 2001, Vienna
University of Technology.
http://www.ci.tuwien.ac.at/Conferences/DSC-2001/Proceedings/Ripley.pdf


Torsten Hothorn, David A. James, Brian D. Ripley,  R/S
Interfaces to Databases  in Proceedings of the
Distributed Statistical Computing 2001 Workshop,
2001,Vienna University of Technology.
http://www.ci.tuwien.ac.at/Conferences/DSC-2001/Proceedings/HothornJamesRipley.pdf

Luís Torgo, Data Mining with R. Learning by case
studies, Maggio 2003
http://www.liacc.up.pt/~ltorgo/DataMiningWithR/

I hope I give you a little help.
Best
Vito




You wrote:

Deall all, 
We need to perform a statistical analysis of a large
database (40,000 entries with approximately 500 fields
in each entry) currently handled in Oracle. The data
contains categorical variables only. 
At the current stage we suggest classification and
clustering analysis. 
We are planning to perform the analysis in R  and
would be very grateful for any
recommendations/suggestions/references regarding the
packages/tools appropriate for this task. 
Thank you in advance for your attention, 
Vicky Landsman


=====
Diventare costruttori di soluzioni

"The business of the statistician is to catalyze 
the scientific learning process."  
George E. P. Box


Visitate il portale http://www.modugno.it/
e in particolare la sezione su Palese http://www.modugno.it/archivio/cat_palese.shtml




More information about the R-help mailing list