[R] Help-Multi class classification for large datasets

Ranjana Girish ranjanagirish30 at gmail.com
Tue Jul 18 09:37:18 CEST 2017


Hai all,

We are working on Multi-class Classification. Currently up to 1.1 million
records Ranger package in R is able to handle. Training time on 128 GB RAM
is 12 days, which is not a practically feasible method to proceed further.

In future we will have dataset of dimension 10 million records, we are in
search for a package or framework which can handle 10 million records with
at least 12000 features.


The package or framework we are searching should handle all the below tasks:

1. Pre-processing of words in corpus( Stopword removal, stemming, remove
special character)
2. Construct document term matrix
3. Feature selection process like chi square, information gain, Gain ration.
4. Random forest classification etc

Kindly let us know the package or framework which can scale up to 10
million rows and 12 columns.

	[[alternative HTML version deleted]]



More information about the R-help mailing list