[R] Bug in by() function which works for some FUN argument and does not work for others

David Winsemius dwinsemius at comcast.net
Sun Apr 17 04:22:44 CEST 2016


> On Apr 16, 2016, at 2:03 AM, Akhilesh Singh <akhileshsingh.igkv at gmail.com> wrote:
> 
> Dear All, 
> 
> I have got your core message, that it is my responsibility to determine whether any particular function in my version of R satisfies the language requirements at the time of your use. Jim Albert and Maria Rizzo must have used their code, which was permitted in the R-code of their time (2012). 
> 
> Therefore, I have now modified my R-code, as per R-3..2.4 version, according to my requirement as follows, which is working for my 'brain' data set, whose output is reproduced below for your information please:
> 
> > by(brain[,-1], INDICES=list(Gender=brain$Gender), FUN=function(x, na.rm=FALSE) sapply(x, mean, na.rm=na.rm), na.rm=TRUE)
> Gender: Female
>       FSIQ        VIQ        PIQ     Weight     Height  MRI_Count 
>    111.900    109.450    110.450    137.200     65.765 862654.600 
> -------------------------------------------------------------------------------------------------- 
> Gender: Male
>         FSIQ          VIQ          PIQ       Weight       Height    MRI_Count 
>    115.00000    115.25000    111.60000    166.44444     71.43158 954855.40000 

Yes. that is certainly a workable alternative, although I thought the question of "how to to it" had been effectively answered with the suggestion from Adrian Dusa to use colMeans. It, too, has an `na.rm=TRUE` option

I was only responding to your plaintive complaint that the current version of R had a "bug" because it was not behaving as promised by an introductory text with a three year-old publishing date.

-- 
David.


> 
> With best regards,
> 
> Dr. A.K. Singh
> Head, Department of Agril. Statistics
> Indira Gandhi Krishi Vishwavidyalaya, Raipur
> Chhattisgarh, India, PIN-492012
> Mobile: +919752620740
> Email: akhileshsingh.igkv at gmail.com
> 
> On Fri, Apr 15, 2016 at 2:24 PM, David Winsemius <dwinsemius at comcast.net> wrote:
> 
> > On Apr 15, 2016, at 1:16 AM, Akhilesh Singh <akhileshsingh.igkv at gmail.com> wrote:
> >
> > Dear All,
> >
> > Thanks for your help. However, I would like to draw your attention to the
> > following:
> >
> > Actually, I was replicating the Example 2.3, using the dataset
> > "brainsize.txt" given in Section 2.3.3 ("Summarize by group") at page 55,
> > of a famous book "R by Example" written by "Jim Albert and Maria Rizzo"
> > published in Springers (2012) in a Use R! Series. The output of the by()
> > function printed in the book is being reproduced below for information to
> > all:
> >
> >> by(data=brain[, -1], INDICES=brain$Gender, FUN=mean, na.rm=TRUE)
> > brain$Gender: Female
> > FSIQ VIQ PIQ Weight Height MRI_Count
> > 111.900 109.450 110.450 137.200 65.765 862654.600
> > ------------------------------------------------------------
> > brain$Gender: Male
> > FSIQ  VIQ    PIQ       Weight    Height   MRI_Count
> > 115.00000 115.25000 111.60000 166.44444 71.43158 954855.40000
> >
> >
> > I do not know how could the writers of the book have produced the above
> > results by by() function.
> 
> 
> There was in the not-so-distant past a function named `mean.data.frame` which would have "worked" in that instance. That function was removed. I thought you could  find the exact date of that action by searching the NEWS but failed. Reviewing the citations of `mean.data.frame` in the r-help archives I see that users were being warned that its use was deprecated in mid 2012.  It's very possible that the authors of a book in 2012 were using an earlier version of R that had that facility available to them before it was deprecated. With a more than current version of R 3.3.0 and a modest number of loaded packages I see this:
> 
> > methods(mean)
>  [1] mean,ANY-method          mean,Matrix-method       mean,Raster-method
>  [4] mean,sparseMatrix-method mean,sparseVector-method mean.Date
>  [7] mean.default             mean.difftime            mean.POSIXct
> [10] mean.POSIXlt             mean.yearmon*            mean.yearqtr*
> [13] mean.zoo*
> 
> It is your responsibility to determine whether any particular function in your version of R satisfies the language requirements at the time of your use. Jim Albert and Maria Rizzo do not set the standards for what is an evolving piece of software.
> 
> --
> David.
> 
> 
> > But, when I could not reproduce these results,
> > then I thought that probably, this could possibly be due to some missing
> > values NA's in Weight and Height variables. Then I tried the above code for
> > the "mtcars" dataset for INDICES=mtcars$am. When I found the same results
> > here too, then I reported the case in "r-help at R-project.org".
> >
> > With best regards,
> >
> > Dr. A.K. Singh
> > Head, Department of Agril. Statistics
> > Indira Gandhi Krishi Vishwavidyalaya, Raipur
> > Chhattisgarh, India, PIN-492012
> > Mobile: +919752620740
> > Email: akhileshsingh.igkv at gmail.com
> >
> > On Fri, Apr 15, 2016 at 3:06 AM, Adrian Dușa <dusa.adrian at unibuc.ro> wrote:
> >
> >> I think you are not using the best function for what your intentions are.
> >> Try:
> >>
> >>> by(data=mtcars, INDICES=list(as.factor(mtcars$am)), FUN=colMeans)
> >> : 0
> >>        mpg         cyl        disp          hp        drat          wt
> >>     qsec          vs
> >> 17.1473684   6.9473684 290.3789474 160.2631579   3.2863158   3.7688947
> >> 18.1831579   0.3684211
> >>         am        gear        carb
> >>  0.0000000   3.2105263   2.7368421
> >>
> >> ---------------------------------------------------------------------------
> >> : 1
> >>        mpg         cyl        disp          hp        drat          wt
> >>     qsec          vs
> >> 24.3923077   5.0769231 143.5307692 126.8461538   4.0500000   2.4110000
> >> 17.3600000   0.5384615
> >>         am        gear        carb
> >>  1.0000000   4.3846154   2.9230769
> >>
> >> See the difference between colMeans() and mean() in their respective help
> >> files.
> >> Hth,
> >> Adrian
> >>
> >> On Thu, Apr 14, 2016 at 11:14 PM, Akhilesh Singh <
> >> akhileshsingh.igkv at gmail.com> wrote:
> >>
> >>> Dear Sirs,
> >>>
> >>> I am Professor at Indira Gandhi Krishi Vishwavidyalaya, Raipur,
> >>> Chhattisgarh, India.
> >>>
> >>> While taking classes, I found the *by() *function producing following
> >>> error
> >>>
> >>> when I use FUN=mean or median and some other functions, however,
> >>> FUN=summary works.
> >>>
> >>> Given below is the output of the example I used on a built-in dataset
> >>> "mtcars", along with error message reproduced herewith:
> >>>
> >>>> by(data=mtcars, INDICES=list(mtcars$am), FUN=mean)
> >>> : 0
> >>> [1] NA
> >>> ------------------------------------------------------------
> >>> : 1
> >>> [1] NA
> >>> Warning messages:
> >>> 1: In mean.default(data[x, , drop = FALSE], ...) :
> >>>  argument is not numeric or logical: returning NA
> >>> 2: In mean.default(data[x, , drop = FALSE], ...) :
> >>>  argument is not numeric or logical: returning NA
> >>>
> >>> However, the same by() function works for FUN=summary, given below is the
> >>> output:
> >>>
> >>>> by(data=mtcars, INDICES=list(mtcars$am), FUN=summary)
> >>> : 0
> >>>      mpg             cyl             disp             hp
> >>> Min.   :10.40   Min.   :4.000   Min.   :120.1   Min.   : 62.0
> >>> 1st Qu.:14.95   1st Qu.:6.000   1st Qu.:196.3   1st Qu.:116.5
> >>> Median :17.30   Median :8.000   Median :275.8   Median :175.0
> >>> Mean   :17.15   Mean   :6.947   Mean   :290.4   Mean   :160.3
> >>> 3rd Qu.:19.20   3rd Qu.:8.000   3rd Qu.:360.0   3rd Qu.:192.5
> >>> Max.   :24.40   Max.   :8.000   Max.   :472.0   Max.   :245.0
> >>>      drat             wt             qsec             vs               am
> >>>
> >>> Min.   :2.760   Min.   :2.465   Min.   :15.41   Min.   :0.0000   Min.
> >>> :0
> >>>
> >>> 1st Qu.:3.070   1st Qu.:3.438   1st Qu.:17.18   1st Qu.:0.0000   1st
> >>> Qu.:0
> >>>
> >>> Median :3.150   Median :3.520   Median :17.82   Median :0.0000   Median
> >>> :0
> >>>
> >>> Mean   :3.286   Mean   :3.769   Mean   :18.18   Mean   :0.3684   Mean
> >>> :0
> >>>
> >>> 3rd Qu.:3.695   3rd Qu.:3.842   3rd Qu.:19.17   3rd Qu.:1.0000   3rd
> >>> Qu.:0
> >>>
> >>> Max.   :3.920   Max.   :5.424   Max.   :22.90   Max.   :1.0000   Max.
> >>> :0
> >>>
> >>>      gear            carb
> >>> Min.   :3.000   Min.   :1.000
> >>> 1st Qu.:3.000   1st Qu.:2.000
> >>> Median :3.000   Median :3.000
> >>> Mean   :3.211   Mean   :2.737
> >>> 3rd Qu.:3.000   3rd Qu.:4.000
> >>> Max.   :4.000   Max.   :4.000
> >>> ------------------------------------------------------------
> >>> : 1
> >>>      mpg             cyl             disp             hp             drat
> >>>
> >>> Min.   :15.00   Min.   :4.000   Min.   : 71.1   Min.   : 52.0   Min.
> >>> :3.54
> >>> 1st Qu.:21.00   1st Qu.:4.000   1st Qu.: 79.0   1st Qu.: 66.0   1st
> >>> Qu.:3.85
> >>> Median :22.80   Median :4.000   Median :120.3   Median :109.0   Median
> >>> :4.08
> >>> Mean   :24.39   Mean   :5.077   Mean   :143.5   Mean   :126.8   Mean
> >>> :4.05
> >>> 3rd Qu.:30.40   3rd Qu.:6.000   3rd Qu.:160.0   3rd Qu.:113.0   3rd
> >>> Qu.:4.22
> >>> Max.   :33.90   Max.   :8.000   Max.   :351.0   Max.   :335.0   Max.
> >>> :4.93
> >>>       wt             qsec             vs               am         gear
> >>>
> >>> Min.   :1.513   Min.   :14.50   Min.   :0.0000   Min.   :1   Min.
> >>> :4.000
> >>>
> >>> 1st Qu.:1.935   1st Qu.:16.46   1st Qu.:0.0000   1st Qu.:1   1st
> >>> Qu.:4.000
> >>>
> >>> Median :2.320   Median :17.02   Median :1.0000   Median :1   Median
> >>> :4.000
> >>>
> >>> Mean   :2.411   Mean   :17.36   Mean   :0.5385   Mean   :1   Mean
> >>> :4.385
> >>>
> >>> 3rd Qu.:2.780   3rd Qu.:18.61   3rd Qu.:1.0000   3rd Qu.:1   3rd
> >>> Qu.:5.000
> >>>
> >>> Max.   :3.570   Max.   :19.90   Max.   :1.0000   Max.   :1   Max.
> >>> :5.000
> >>>
> >>>      carb
> >>> Min.   :1.000
> >>> 1st Qu.:1.000
> >>> Median :2.000
> >>> Mean   :2.923
> >>> 3rd Qu.:4.000
> >>> Max.   :8.000
> >>>>
> >>>
> >>> I am using the latest version of *R-3.2.4 on Windows*, however, this error
> >>> is being generated in the previous version too,
> >>>
> >>> Hope this reporting will get serious attention in debugging.
> >>>
> >>> With best regards,
> >>>
> >>> Dr. A.K. Singh
> >>> Head, Department of Agril. Statistics
> >>> Indira Gandhi Krishi Vishwavidyalaya, Raipur
> >>> Chhattisgarh, India, PIN-492012
> >>> Mobile: +919752620740
> >>> Email: akhileshsingh.igkv at gmail.com
> >>>
> >>>        [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>
> >>
> >>
> >> --
> >> Adrian Dusa
> >> University of Bucharest
> >> Romanian Social Data Archive
> >> Soseaua Panduri nr.90
> >> 050663 Bucharest sector 5
> >> Romania
> >>
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius
> Alameda, CA, USA
> 
> 

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list