[Rd] tm and e1071 question

Jeszenszky Peter jeszenszky.peter at inf.unideb.hu
Fri Dec 4 13:24:43 CET 2009


Dear Developers,

I would like to use the svm function of the e1071 package for text
classification tasks. Preprocessing can be carried out by using the
excellent tm text mining package.

TermDocumentMatrix and DocumentTermMatrix objects of the package tm
are currently implemented based on the sparse matrix data structures
provided by the slam package.

Unfortunately, the svm function of the e1071 package accepts only sparse
matrices of class Matrix provided by the Matrix package, or of class
matrix.csr as provided by the package SparseM.

In order to train an SVM with a DocumentTermMatrix object the latter
must be converted to a matrix.csr sparse matrix structure. However, none
of the publicly available packages of CRAN provides such a conversion
function. It is quite straightforward to write the conversion function,
but it would be much confortable to pass slam sparse matrix objects
directly to the svm function.

Do you plan to add slam sparse matrix support to the e1071 package?

Best regards,

Peter Jeszenszky



More information about the R-devel mailing list