[R] {Link Suspeito} Re: variable (column) in a data frame

Paulo Barata paulo.barata at ensp.fiocruz.br
Tue Jul 17 17:27:36 CEST 2012


Dear Bert and Sarah,

Thank you very much for your clarifications on this matter. I will
have to study more closely the way extracting subsets of data
structures is performed, and I will change my programming habits 
accordingly.

Best regards,

Paulo Barata

---------------------------------------------------------------------


---------- Original Message -----------
From: Bert Gunter <gunter.berton at gene.com>
To: Paulo Barata <paulo.barata at ensp.fiocruz.br>
Cc: Frans Marcelissen <frans.marcelissen at digipsy.nl>, r-help at r-project.org,
ehlers at ucalgary.ca
Sent: Tue, 17 Jul 2012 08:06:57 -0700
Subject: {Link Suspeito} Re: [R] variable (column) in a data frame

> Inline below.
> 
> -- Bert
> 
> On Tue, Jul 17, 2012 at 7:40 AM, Paulo Barata
> <paulo.barata at ensp.fiocruz.br>wrote:
> 
> >
> > Dear Frans and Peter,
> >
> > Yes, the notation df[,'var'] is able to catch a non-existent
> > variable var inside a data frame df. But the notation df$var
> > isn't.
> >
> > So we have this situation, where two different notations, which
> > (as far as I understand) perform the same action, have different
> > kinds of response.
> >
> > You don't understand far enough. Your assumption is simply not true. For
> example, from ?"[" :
> 
> "The most important distinction between [, [[ and $ is that the [ can
> select more than one element whereas the other two select a single element.
> 
> The default methods work somewhat differently for atomic vectors,
> matrices/arrays and for recursive (list-like, see
> is.recursive<http://127.0.0.1:25542/library/base/help/is.recursive>)
> objects. $ is only valid for recursive objects, and is only 
> discussed in the section below on recursive objects."
> 
> So the Help page already notes that there are differences among them.
> 
> Nevertheless, your discomfort is, imo, understandable.
> Extraction/replacement for data structures is a complex business,
>  and R's approach to the issues have "evolved" over time, with 
> "inconsistencies," especially for edge cases, baked in. Because 
> these issues are at the very core of R's behavior, I think it likely 
> that except for egregious inconsistencies and outright bugs -- which 
> at this point are most unlikely to exist -- it is well nigh 
> impossible to change them. I see no recourse but to always check 
> such edge cases carefully and to be as consistent as possible in 
> your own programming usage (e.g. always using [,".."] for extracting 
> columns). As Peter has pointed out several times, the $ extractor is 
> convenient syntactic sugar that can get one into a lot of trouble, 
> and is probably best avoided.
> 
> Cheers,
> 
> Bert
> 
> > Couldn't this situation be fixed? Isn't it possible to make the
> > df$var notation to issue an error when referring to a non-existent
> > variable inside the data frame?
> >
> > Thank you very much.
> >
> > Paulo Barata
> >
> > ---------------------------------------------------------------------
> >
> >
> > ---------- Original Message -----------
> > From: "Frans Marcelissen" <frans.marcelissen at digipsy.nl>
> > To: "'Paulo Barata'" <paulo.barata at ensp.fiocruz.br>, <r-help at r-project.org
> > >
> > Sent: Mon, 16 Jul 2012 14:25:21 +0200
> > Subject: RE: [R] variable (column) in a data frame
> >
> > > Hoi Pauli,
> > > There is a difference between two ways of accessing columns in a matrex:
> > > > df$aaa
> > > NULL
> > > > df["AAA"]
> > > Error in `[.data.frame`(df, "AAA") : undefined columns selected
> > > So df["AAA"] or df[,"AAA"] gives the error message you expect.
> > > -------------------
> > > Frans
> > >
> > > -----Oorspronkelijk bericht-----
> > > Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> > > Namens Paulo Barata
> > > Verzonden: zondag 15 juli 2012 16:31
> > > Aan: r-help at r-project.org
> > > Onderwerp: [R] variable (column) in a data frame
> > >
> > > To the R help list,
> > >
> > > When using a data frame, there is no warning or error message when I
> > > refer to a non-existent variable inside the data frame.
> > >
> > > Example:
> > >
> > > ##----------------------------------------------
> > >
> > > a <- c(1,2,3)
> > > b <- c(11,22,33)
> > > df <- data.frame(a,b)
> > > df
> > >
> > > ## correct: there is a column in df named 'a'
> > > ## the sum is correctly performed
> > > sum(df$a==2)
> > >
> > > ## incorrect: there is no column in df named 'aaa', ## but the sum is
> > > performed anyway without either warning or error
> > > sum(df$aaa==2)
> > >
> > > ##----------------------------------------------
> > >
> > > Is there some way to make R issue either a warning or an error
> > > message in such a situation?
> > >
> > > I am using R version 2.15.1 64-bit on Windows 7 Professional.
> > >
> > > Thank you very much.
> > >
> > > Paulo Barata
> > >
> > > ---------------------------------------------------------------------
> > > Paulo Barata
> > >
> > > ENSP - Fundação Oswaldo Cruz
> > > Rua Leopoldo Bulhões 1480 - 8A
> > > 21041-210  Rio de Janeiro - RJ
> > > Brazil
> > > E-mail: paulo.barata at ensp.fiocruz.br
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > > --
> > > This message has been scanned for viruses and
> > > dangerous content by MailScanner, and is
> > > believed to be clean.
> > ------- End of Original Message -------
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> --
> 
> Bert Gunter
> Genentech Nonclinical Biostatistics
> 
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-
> groups/pdb-biostatistics/pdb-ncb-home.htm
> 
> -- 
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
------- End of Original Message -------



More information about the R-help mailing list