[Rd] Documentation examples for lm and glm

Heinz Tuechler tuechler @ending from gmx@@t
Mon Dec 17 17:36:09 CET 2018


Dear John,

fully agreed! In the global environment I always keep my 
"data-variables" in a data.frame. However, if I look in help I like 
examples that start with the particular aspects of a function. It is 
important to know, if a function offers a data argument, but in the 
first line I don't need an example for the use of a data argument each 
time I look in help.

best,
Heinz

Fox, John wrote/hat geschrieben on/am 17.12.2018 16:23:
> Dear Heinz,
>
>   ----------------------------------------------
>> On Dec 17, 2018, at 10:19 AM, Heinz Tuechler <tuechler using gmx.at> wrote:
>>
>> Dear All,
>>
>> do you think that use of a data argument is best practice in the example below?
>
> No, but it is *normally* or *usually* the best option, in my opinion.
>
> Best,
>  John
>
>>
>> regards,
>>
>> Heinz
>>
>> ### trivial example
>> plotwithline <- function(x, y) {
>>    plot(x, y)
>>    abline(lm(y~x)) ## data argument?
>> }
>>
>> set.seed(25)
>> df0 <- data.frame(x=rnorm(20), y=rnorm(20))
>>
>> plotwithline(df0[['x']], df0[['y']])
>>
>>
>>
>> Fox, John wrote/hat geschrieben on/am 17.12.2018 15:21:
>>> Dear Martin,
>>>
>>> I think that everyone agrees that it’s generally preferable to use the data argument to lm() and I have nothing significant to add to the substance of the discussion, but I think that it’s a mistake not to add to the current examples, for the following reasons:
>>>
>>> (1) Relegating examples using the data argument to “see also” doesn’t suggest that using the argument is a best practice. Most users won’t bother to click the links.
>>>
>>> (2) In my opinion, an new initial example using the data argument would more clearly suggest that this is the normally the best option.
>>>
>>> (3) I think that it would also be desirable to add a remark to the explanation of the data argument, something like, “Although the argument is optional, it's generally preferable to specify it explicitly.” And similarly on the help page for glm().
>>>
>>> My two (or three) cents.
>>>
>>> John
>>>
>>>  -------------------------------------------------
>>>  John Fox, Professor Emeritus
>>>  McMaster University
>>>  Hamilton, Ontario, Canada
>>>  Web: http::/socserv.mcmaster.ca/jfox
>>>
>>>> On Dec 17, 2018, at 3:05 AM, Martin Maechler <maechler using stat.math.ethz.ch> wrote:
>>>>
>>>>>>>>> David Hugh-Jones
>>>>>>>>>   on Sat, 15 Dec 2018 08:47:28 +0100 writes:
>>>>
>>>>> I would argue examples should encourage good
>>>>> practice. Beginners ought to learn to keep data in data
>>>>> frames and not to overuse attach().
>>>>
>>>> Note there's no attach() there in any of these examples!
>>>>
>>>>> otherwise at their own risk, but they have less need of
>>>>> explicit examples.
>>>>
>>>> The glm examples are nice in sofar they show both uses.
>>>>
>>>> I agree the lm() example(s) are  "didactically misleading" by
>>>> not using data frames at all.
>>>>
>>>> I disagree that only data frame examples should be shown.
>>>> If  lm()  is one of the first R functions a beginneR must use --
>>>> because they are in a basic stats class, say --  it may be
>>>> *better* didactically to focus on lm()  in the very first
>>>> example, and use data frames in a next one ...
>>>> .... and instead of next one, we have the pretty clear comment
>>>>
>>>> ### less simple examples in "See Also" above
>>>>
>>>> I'm not convinced (but you can try more) we should change those
>>>> examples or add more there.
>>>>
>>>> Martin
>>>>
>>>>> On Fri, 14 Dec 2018 at 14:51, S Ellison
>>>>> <S.Ellison using lgcgroup.com> wrote:
>>>>
>>>>>> FWIW, before all the examples are changed to data frame
>>>>>> variants, I think there's fairly good reason to have at
>>>>>> least _one_ example that does _not_ place variables in a
>>>>>> data frame.
>>>>>>
>>>>>> The data argument in lm() is optional. And there is more
>>>>>> than one way to manage data in a project. I personally
>>>>>> don't much like lots of stray variables lurking about,
>>>>>> but if those are the only variables out there and we can
>>>>>> be sure they aren't affected by other code, it's hardly
>>>>>> essential to create a data frame to hold something you
>>>>>> already have.  Also, attach() is still part of R, for
>>>>>> those folk who have a data frame but want to reference
>>>>>> the contents across a wider range of functions without
>>>>>> using with() a lot. lm() can reasonably omit the data
>>>>>> argument there, too.
>>>>>>
>>>>>> So while there are good reasons to use data frames, there
>>>>>> are also good reasons to provide examples that don't.
>>>>>>
>>>>>> Steve Ellison
>>>>>>
>>>>>>
>>>>>>> -----Original Message----- > From: R-devel
>>>>>> [mailto:r-devel-bounces using r-project.org] On Behalf Of Ben >
>>>>>> Bolker > Sent: 13 December 2018 20:36 > To:
>>>>>> r-devel using r-project.org > Subject: Re: [Rd] Documentation
>>>>>> examples for lm and glm
>>>>>>>
>>>>>>>
>>>>>>> Agree.  Or just create the data frame with those
>>>>>> variables in it > directly ...
>>>>>>>
>>>>>>> On 2018-12-13 3:26 p.m., Thomas Yee wrote: > > Hello,
>>>>>>>>
>>>>>>>> something that has been on my mind for a decade or
>>>>>> two has > > been the examples for lm() and glm(). They
>>>>>> encourage poor style > > because of mismanagement of data
>>>>>> frames. Also, having the > > variables in a data frame
>>>>>> means that predict() > > is more likely to work properly.
>>>>>>>>
>>>>>>>> For lm(), the variables should be put into a data
>>>>>> frame.  > > As 2 vectors are assigned first in the
>>>>>> general workspace they > > should be deleted afterwards.
>>>>>>>>
>>>>>>>> For the glm(), the data frame d.AD is constructed but
>>>>>> not used. Also, > > its 3 components were assigned first
>>>>>> in the general workspace, so they > > float around
>>>>>> dangerously afterwards like in the lm() example.
>>>>>>>>
>>>>>>>> Rather than attached improved .Rd files here, they
>>>>>> are put at > > www.stat.auckland.ac.nz/~yee/Rdfiles > >
>>>>>> You are welcome to use them!
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> Thomas
>>>>>>>>



More information about the R-devel mailing list