[R] Chi2 algorithm - R

peter dalgaard pdalgd at gmail.com
Wed Nov 23 22:21:36 CET 2016


Notice that this relates to an R _package_, which has a maintainer. You cannot expect general R users or developers to know about the details of the package. It doesn't look like there is dcoumentation beyond the help pages, so you may need to contact the maintainer or study the actual code.

-pd 

> On 23 Nov 2016, at 17:08 , Luke Skywalker <mattered91 at gmail.com> wrote:
> 
> Good evening,
> 
> I'm encountering a different kind of discretization with respect to the
> 1997 Liu and Setiono's one descripted in their papers, using Chi2 algorithm
> for feature selection with discretization.
> 
> As stated in R documentation (discretization - R (from CRAN)
> <https://cran.r-project.org/web/packages/discretization/discretization.pdf>),
> R package discretizion offers the function Chi2, which comes to life in the
> following papers:
> 
> Liu, H. and Setiono, R. (1995). Chi2: Feature selection and discretization
> of numeric attributes, Tools with Artificial Intelligence, 388–391.
> 
> Liu, H. and Setiono, R. (1997). Feature selection and discretization, IEEE
> transactions on knowledge and data engineering, Vol.9, no.4, 642–645.
> 
> I wrote the following R programming language code, in which I have set
> alpha and delta equal to the ones set in the papers above. Finally, the
> following code prints out the discretized dataframe. I used Iris dataframe,
> as in one of the examples in the two papers. The first paper above states
> that alfa = 0.5 and delta = 5%, and that "the originally odd numbered data
> are selected for training (75 patterns) and rest for testing (75
> patterns)". With this asset, Sepal attributes should be removed.
> 
> library(discretization)
> data(iris)
> df1 <- iris[FALSE,]for(i in 1:nrow(iris)){
>    if(i %% 2 != 0){
>        df1 <- rbind(df1, iris[i,])
>    }}
> chi2(df1, alp=0.5, del=0.05)$Disc.data
> 
> The point is that, observing the dataframe printed out by the last
> instruction, you can see that no attribute is removed. The discretized data
> frame still have 4 attributes discretized: if I correctly understood the
> above papers, Sepal Length and Sepal Width should have been both
> discretized in just one interval by Chi2 algorithm.
> 
> I have posted a question here: http://stats.stackexchange.com/questions/
> 247499/why-does-not-r-chi2-algorithm-discretize-in-the-
> same-manner-as-in-the-paper-by-l?noredirect=1#comment470974_247499.
> 
> 
> Moreover, it's really hard to understand the cut points that Chi2 algorithm
> implemented in R makes. For example:
> 
> res <- chi2(iris, 0.5, 0.05)
> 
> cut(iris$Sepal.Length, res$cutp, labels=FALSE) is different from
> res$Disc.data$Sepal.Length
> 
> Help me understand, please
> 
> Best regards
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list