[Rd] Regression stars

Hervé Pagès hpages at fhcrc.org
Tue Feb 12 21:27:13 CET 2013


Hi Duncan,

On 02/12/2013 11:19 AM, Duncan Murdoch wrote:
> On 12/02/2013 1:47 PM, Hervé Pagès wrote:
>> On 02/12/2013 08:20 AM, peter dalgaard wrote:
>> >
>> > On Feb 12, 2013, at 17:05 , Brian Lee Yung Rowe wrote:
>> >
>> >>
>> >> I thought that the default was the way it was for performance
>> reasons. For large data.frames or repeated applications, using factors
>> should be faster for non-trivial strings.
>> >
>> > I think not. Historically, it's more like "In statistics we have two
>> kinds of variables, numerical and categorical. OK, so we have the
>> occasional truly character-type variables like name and address, let's
>> handle those as a special case".
>>
>> <sarcasm>
>>
>> Since character vectors are sooooo bad and people use them where
>> they should instead use a factor, I propose to go all the way and
>> by adding the stringsAsFactors arg to character() too. That way
>> people are put on the right track from the very start.
>>
>> </sarcasm>
>
> I think you are misreading what Peter wrote.  He wasn't defending that
> point of view, he was describing it.

I was answering to the thread, not to Peter in particular. Sorry if it
sounded otherwise.

>>
>> No seriously, if my variable is categorical, it's already in a factor
>> and that's how I pass it to data.frame(). But if I have it in a
>> character vector, it's because that's how I want it. It's my choice.
>> How could anybody ever think that having data.frame() alter his/her
>> data is a good thing?
>>
>> Please *remove* the stringsAsFactors arg of data.frame() in R 3.0.
>> You'll do a big favor to your user base.
>
> That's a really bad suggestion -- it would break code for people who set
> stringsAsFactors=FALSE as well as those who rely on the current default
> behaviour.   We certainly won't do that.

But since there seems to be a discussion about doing some changes to
the stringsAsFactors "feature", I was hoping you would consider that
one too.  Doing the right thing sometimes requires breaking people's
code, sadly!

Cheers,
H.

>
> Duncan Murdoch
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the R-devel mailing list