[R] Column name containing "-"

R. Michael Weylandt michael.weylandt at gmail.com
Tue Jan 24 17:34:12 CET 2012


Sorry, I meant check.names = FALSE (d'oh!)

Michael

On Tue, Jan 24, 2012 at 11:33 AM, R. Michael Weylandt
<michael.weylandt at gmail.com> wrote:
> I've usually understand the restrictions on syntactic names as being
> tied to the parser.
>
> E.g., how could R tell the difference between
>
> d <- data.frame(a = 3, `a-2` = 3, check.names = TRUE)
> d$a-2 ## Equal to 1 or 3 ?
>
> One of those strange eval things that makes alot of sense for an
> interactive language, but might not be the best for a
> formal-programming language (but I don't think it causes any serious
> restrictions)
>
> When you force it to be a name, e.g., d$`a-2` then there's no
> confusion so it's allowable because it's potentially useful for
> formatting output. (One case that comes to mind: delisted stocks are
> given tickers that begin with numbers: R wants to stick an X on the
> front of the name, but then you loose compatibility with your data
> source)
>
> Michael
>
> On Tue, Jan 24, 2012 at 11:25 AM, Ivan Calandra
> <ivan.calandra at u-bourgogne.fr> wrote:
>> Bert,
>>
>> Thank you for correcting my inaccuracy. A quick look at the original
>> question might help you understand what I meant:
>>
>> d<- data.frame(x = c(0, 1))
>> d<- data.frame(d, y = c(0,1))
>> names(d)[2]<- "a.-5"
>> d
>>  x a.-5
>> 1 0    0
>> 2 1    1
>> d1<- data.frame(d, y = c(0,1))
>> d1
>>  x a..5 y
>> 1 0    0 0
>> 2 1    1 1
>> d2<- data.frame(d, y = c(0,1), check.names=FALSE)
>> d2
>>  x a.-5 y
>> 1 0    0 0
>> 2 1    1 1
>>
>> With check.names=TRUE, the dash is converted to a period. With
>> check.names=FALSE, the dash is conserved. So the dash is not a problem per
>> se, because data.frame() doesn't throw an error or warning in this case.
>>
>> Then my question is, why is it converted? To avoid problems with other
>> functions? To avoid confusion and mischief as you mentioned because it is
>> the symbol for subtraction? If it can be that problematic, why not just not
>> allow it at all? I guess there are reasons for these behaviors and I am
>> curious to learn more about the logic behind it.
>>
>> Actually, I find that data.frame() can be confusing. On the one hand it
>> accepts unquoted strings to define column names, like in your first example.
>> But on the other hand, it doesn't accept it if it can be confusing like in
>> your second example. I am definitely not experienced enough to judge whether
>> the behavior makes sense or not, but I am curious to know why quoted strings
>> are not required in data.frame(). This behavior would be consistent, and
>> therefore easier to understand for beginners, I think.
>>
>> Thank you for your insights,
>> Ivan
>>
>>
>>
>> Le 24/01/12 16:53, Bert Gunter a écrit :
>>>
>>> Ivan:
>>>
>>> On Tue, Jan 24, 2012 at 6:47 AM, Ivan Calandra
>>> <ivan.calandra at u-bourgogne.fr>  wrote:
>>>>
>>>> By "it works anyway", I mean that you can have a dash in a column name,
>>>> there is no error or even warning.
>>>> I guess that some functions would throw an error or warning, depending on
>>>> the requirements, but data.frame() doesn't.
>>>
>>> This is false. Please don't guess. Read the Help pages.
>>>
>>>> data.frame(a = 1:3)  #fine
>>>> data.frame(a-3 = 1:3) # Error: unexpected '=' in "data.frame(a-3 ="
>>>
>>> The name in **NOT** OK. However,
>>>>
>>>> data.frame("a-3" = 1:3) # fine
>>>
>>>   a.3
>>> 1   1
>>> 2   2
>>> 3   3
>>>
>>> ## A quoted  character string can be used as a column name
>>> ## The name will be changed to a legal name unless:
>>>
>>>> data.frame("a-3" = 1:3,check.names=FALSE)
>>>
>>>   a-3
>>> 1   1
>>> 2   2
>>> 3   3
>>>
>>> However, as is obvious, there is much mischief possible from such
>>> practices, so that they are best avoided.
>>>
>>> -- Bert
>>>
>>>
>>>> Ivan
>>>>
>>>> Le 24/01/12 15:35, David Winsemius a écrit :
>>>>>
>>>>>
>>>>> On Jan 24, 2012, at 4:44 AM, Ivan Calandra wrote:
>>>>>
>>>>>> Hi Mark,
>>>>>>
>>>>>> I cannot tell you why (maybe someone else can), but the check.names
>>>>>> argument to data.frame() interpret "a.-5" as an unvalid name and
>>>>>> convert to
>>>>>> to a valid one. What I don't understand is why it isn't "valid" since
>>>>>> it
>>>>>> works anyway.
>>>>>
>>>>>
>>>>> The dash is not a valid character for column names. What do you mean by
>>>>> "it works anyway"?
>>>>>
>>>> --
>>>> Ivan CALANDRA
>>>> Université de Bourgogne
>>>> UMR CNRS/uB 6282 Biogéosciences
>>>> 6 Boulevard Gabriel
>>>> 21000 Dijon, FRANCE
>>>> +33(0)3.80.39.63.06
>>>> ivan.calandra at u-bourgogne.fr
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>>
>>
>> --
>> Ivan CALANDRA
>> Université de Bourgogne
>> UMR CNRS/uB 6282 Biogéosciences
>> 6 Boulevard Gabriel
>> 21000 Dijon, FRANCE
>> +33(0)3.80.39.63.06
>> ivan.calandra at u-bourgogne.fr
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list