[R] Chi2 algorithm - R

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Thu Nov 24 16:39:12 CET 2016


As I understand it,  you or I could write a package and if the automated testing designed to ferret out basic R language compatibility and operating system independence passes (and the maintainer accepts responsibility for it and releases the code as open source), then it will usually be accepted for distribution through CRAN. Peer review (in particular re the algorithms employed) by anyone involved with the R language development team is not part of this... this is why the distinction between R and contributed packages is important.

So yes, while most maintainers are trying to do the right thing, they are only human (and volunteers), using their code is very much at your own risk and not the responsibility of "R".
-- 
Sent from my phone. Please excuse my brevity.

On November 23, 2016 1:26:45 PM PST, Luke Skywalker <mattered91 at gmail.com> wrote:
>What does it mean to "have a mantainer"? Is he a third party? Is he an
>individual developer and you can install whose package on your risk?
>Are
>the package created by maintainers not tested?
>
>Anyway, I wrote him. I'm waiting for response.
>
>Regards
>
>Il 23/Nov/2016 22:21, "peter dalgaard" <pdalgd at gmail.com> ha scritto:
>
>> Notice that this relates to an R _package_, which has a maintainer.
>You
>> cannot expect general R users or developers to know about the details
>of
>> the package. It doesn't look like there is dcoumentation beyond the
>help
>> pages, so you may need to contact the maintainer or study the actual
>code.
>>
>> -pd
>>
>> > On 23 Nov 2016, at 17:08 , Luke Skywalker <mattered91 at gmail.com>
>wrote:
>> >
>> > Good evening,
>> >
>> > I'm encountering a different kind of discretization with respect to
>the
>> > 1997 Liu and Setiono's one descripted in their papers, using Chi2
>> algorithm
>> > for feature selection with discretization.
>> >
>> > As stated in R documentation (discretization - R (from CRAN)
>> > <https://cran.r-project.org/web/packages/discretization/
>> discretization.pdf>),
>> > R package discretizion offers the function Chi2, which comes to
>life in
>> the
>> > following papers:
>> >
>> > Liu, H. and Setiono, R. (1995). Chi2: Feature selection and
>> discretization
>> > of numeric attributes, Tools with Artificial Intelligence, 388–391.
>> >
>> > Liu, H. and Setiono, R. (1997). Feature selection and
>discretization,
>> IEEE
>> > transactions on knowledge and data engineering, Vol.9, no.4,
>642–645.
>> >
>> > I wrote the following R programming language code, in which I have
>set
>> > alpha and delta equal to the ones set in the papers above. Finally,
>the
>> > following code prints out the discretized dataframe. I used Iris
>> dataframe,
>> > as in one of the examples in the two papers. The first paper above
>states
>> > that alfa = 0.5 and delta = 5%, and that "the originally odd
>numbered
>> data
>> > are selected for training (75 patterns) and rest for testing (75
>> > patterns)". With this asset, Sepal attributes should be removed.
>> >
>> > library(discretization)
>> > data(iris)
>> > df1 <- iris[FALSE,]for(i in 1:nrow(iris)){
>> >    if(i %% 2 != 0){
>> >        df1 <- rbind(df1, iris[i,])
>> >    }}
>> > chi2(df1, alp=0.5, del=0.05)$Disc.data
>> >
>> > The point is that, observing the dataframe printed out by the last
>> > instruction, you can see that no attribute is removed. The
>discretized
>> data
>> > frame still have 4 attributes discretized: if I correctly
>understood the
>> > above papers, Sepal Length and Sepal Width should have been both
>> > discretized in just one interval by Chi2 algorithm.
>> >
>> > I have posted a question here:
>http://stats.stackexchange.com/questions/
>> > 247499/why-does-not-r-chi2-algorithm-discretize-in-the-
>> > same-manner-as-in-the-paper-by-l?noredirect=1#comment470974_247499.
>> >
>> >
>> > Moreover, it's really hard to understand the cut points that Chi2
>> algorithm
>> > implemented in R makes. For example:
>> >
>> > res <- chi2(iris, 0.5, 0.05)
>> >
>> > cut(iris$Sepal.Length, res$cutp, labels=FALSE) is different from
>> > res$Disc.data$Sepal.Length
>> >
>> > Help me understand, please
>> >
>> > Best regards
>> >
>> >       [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> --
>> Peter Dalgaard, Professor,
>> Center for Statistics, Copenhagen Business School
>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>> Phone: (+45)38153501
>> Office: A 4.23
>> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list