[R] Looking for package for data generation for classification and regression

Ranjan Maitra m|m@|tr@ @end|ng |rom gmx@com
Fri Mar 4 18:03:18 CET 2022


On Fri Mar04'22 10:41:24AM, Paul Smith wrote:
> From: Paul Smith <phhs80 using gmail.com>
> Date: Fri, 4 Mar 2022 10:41:24 +0000
> To: Ranjan Maitra <mlmaitra using gmx.com>
> Cc: "r-help using r-project.org" <r-help using r-project.org>
> Subject: Re: [R]  Looking for package for data generation for
>  classification and regression
>
> On Fri, Mar 4, 2022 at 8:07 AM Ranjan Maitra <mlmaitra using gmx.com> wrote:
> >
> > > I am in need of generating artificial data for machine learning
> > > classification and regression analysis. What I am looking for is
> > > something similar to Python sklearn.datasets.make_classification and
> > > sklearn.datasets.make_regression:
> > >
> > > https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html
> > >
> > > https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html
> > >
> > > I have searched CRAN for something similar, but found nothing. Could
> > > someone please help me with this?
> >
> > Not sure if this helps, but at least for classification and clustering, there is the MixSim package on CRAN which provides classification datasets according to an overall overlap measure.
>
> Thanks, Ranjan, that is also quite helpful, since clustering is also a
> topic of the course!
>
> Paul
>

The Clustering Algorithms Referee Package (CARP) uses the same codebase but is more general.

https://jmlr.org/papers/v12/melnykov11a.html

Unfortunately, it is written in C, so may not help.

It is on www.mloss.org at:

https://mloss.org/software/view/248/

but perhaps should also be moved to github.

Best wishes,
Ranjan



More information about the R-help mailing list