[R] Looking for package for data generation for classification and regression

Paul Smith phh@80 @end|ng |rom gm@||@com
Fri Mar 4 18:45:01 CET 2022


On Fri, Mar 4, 2022 at 5:03 PM Ranjan Maitra <mlmaitra using gmx.com> wrote:
>
> > > > I am in need of generating artificial data for machine learning
> > > > classification and regression analysis. What I am looking for is
> > > > something similar to Python sklearn.datasets.make_classification and
> > > > sklearn.datasets.make_regression:
> > > >
> > > > https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html
> > > >
> > > > https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html
> > > >
> > > > I have searched CRAN for something similar, but found nothing. Could
> > > > someone please help me with this?
> > >
> > > Not sure if this helps, but at least for classification and clustering, there is the MixSim package on CRAN which provides classification datasets according to an overall overlap measure.
> >
> > Thanks, Ranjan, that is also quite helpful, since clustering is also a
> > topic of the course!
>
> The Clustering Algorithms Referee Package (CARP) uses the same codebase but is more general.
>
> https://jmlr.org/papers/v12/melnykov11a.html
>
> Unfortunately, it is written in C, so may not help.
>
> It is on www.mloss.org at:
>
> https://mloss.org/software/view/248/
>
> but perhaps should also be moved to github.

That is quite interesting, Ranjan! I hope you will have that on GitHub
as a R package ready for installation.

Best wishes, Paul



More information about the R-help mailing list