[R] variable (column) in a data frame

Peter Ehlers ehlers at ucalgary.ca
Mon Jul 16 00:12:19 CEST 2012


On 2012-07-15 10:01, Paulo Barata wrote:
>
> Dear Peter,
>
> Thank you. I will try to modify my programming habits.
> But it seems there is a flaw in R, when it accepts a reference
> to a non-existent variable inside a data frame with the df$var
> notation. This should be corrected somehow.
>
> Paulo Barata
>

Paulo,

I understand your concerns and I do think that the "best"
thing would be to excise the $ shortcut from the language
or, at least, make y$x equivalent to
y[["x", exact = TRUE]]. But, as has been pointed out
before, that might not be easy. Nevertheless, even y[["x"]]
may not be the ultimate panacea. Consider your own
example:

df <- data.frame(a = 1:3, b=11:13)
sum(df[["aaa"]] == 2)
#[1] 0

which results from

df[["aaa"]] == 2
#logical(0)

The safest extraction is y[ , "x"]:

sum(df[ , "aaa"] == 2)
#Error in `[.data.frame`(df, , "aaa") : undefined columns selected

But then, this comes down to whether one thinks that
addressing a nonexistent variable should result in an
error or should return NULL.

The bottom line probably is that the $ behaviour will not change
in the near future and one would simply be well advised to be
aware of its behaviour. Every language has its quirks. Just be
thankful that the R language isn't as big a mess as the English
language (which I do love dearly).

Peter Ehlers

> ---------------------------------------------------------------------
>
>
> ---------- Original Message -----------
> From: Peter Ehlers<ehlers at ucalgary.ca>
> To: Paulo Barata<paulo.barata at ensp.fiocruz.br>
> Cc: "r-help at r-project.org"<r-help at r-project.org>, peter dalgaard
> <pdalgd at gmail.com>
> Sent: Sun, 15 Jul 2012 09:29:11 -0700
> Subject: Re: [R] variable (column) in a data frame
>
>> On 2012-07-15 08:41, Paulo Barata wrote:
>>>
>>> Dr. Dalgaard,
>>>
>>> Thank you. But pre-checking with is.null() or using with()
>>> doesn't solve the problem of catching spelling mistakes
>>> in the name of a variable inside a data frame, when using
>>> the df$var notation often in a program.
>>>
>>> Is there some way for R to behave, in relation to a variable
>>> inside a data frame, the same way it behaves for a variable
>>> not in a data frame? For example:
>>>
>>> ##----------------------------------------
>>> a<- c(1,2,3)
>>>
>>> ## the variable exists, we get a correct answer
>>> a==1
>>>
>>> ## the variable does not exist, R rightly points this out
>>> aaa==1
>>> ##----------------------------------------
>>>
>>> My point is, if we make a spelling mistake in a program when referring
>>> to a variable inside a data frame, using the df$var notation,
>>> there seems to be no way of getting warned about that.
>>
>> You could wean yourself from the $-habit. It's convenient but can
>> lead to the problems you're experiencing (and this has been
>> discussed before). For programming, if you're prone to make
>> spelling errors, you should prefer df[, "aaa"]. See ?Extract.
>>
>> Peter Ehlers
>>
>>>
>>> Thank you once again.
>>>
>>> Paulo Barata
>>>
>>> ---------------------------------------------------------------------
>>>
>>>
>>> ---------- Original Message -----------
>>> From: peter dalgaard<pdalgd at gmail.com>
>>> To: "Paulo Barata"<paulo.barata at ensp.fiocruz.br>
>>> Sent: Sun, 15 Jul 2012 16:47:35 +0200
>>> Subject: Re: [R] variable (column) in a data frame
>>>
>>>> On Jul 15, 2012, at 16:30 , Paulo Barata wrote:
>>>>
>>>>>
>>>>> To the R help list,
>>>>>
>>>>> When using a data frame, there is no warning or error message
>>>>> when I refer to a non-existent variable inside the data frame.
>>>>>
>>>>> Example:
>>>>>
>>>>> ##----------------------------------------------
>>>>>
>>>>> a<- c(1,2,3)
>>>>> b<- c(11,22,33)
>>>>> df<- data.frame(a,b)
>>>>> df
>>>>>
>>>>> ## correct: there is a column in df named 'a'
>>>>> ## the sum is correctly performed
>>>>> sum(df$a==2)
>>>>>
>>>>> ## incorrect: there is no column in df named 'aaa',
>>>>> ## but the sum is performed anyway without either warning or error
>>>>> sum(df$aaa==2)
>>>>>
>>>>> ##----------------------------------------------
>>>>>
>>>>> Is there some way to make R issue either a warning or an error
>>>>> message in such a situation?
>>>>>
>>>>
>>>> You can pre-check for is.null(df$aaa) or use with(df, sum(aaa==2)).
>>>>
>>>> --
>>>> Peter Dalgaard, Professor,
>>>> Center for Statistics, Copenhagen Business School
>>>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>>>> Phone: (+45)38153501
>>>> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
>>>>
>>>> --
>>>> This message has been scanned for viruses and
>>>> dangerous content by MailScanner, and is
>>>> believed to be clean.
>>> ------- End of Original Message -------
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> --
>> This message has been scanned for viruses and
>> dangerous content by MailScanner, and is
>> believed to be clean.
> ------- End of Original Message -------
>



More information about the R-help mailing list