[R] Getting subsets of a data frame

Fernando Saldanha fsaldan1 at gmail.com
Sat Apr 16 19:49:45 CEST 2005


I am reading as fast as I can! Just started with R five days ago.

I found the following in the documentation:

"Although the default for 'drop' is 'TRUE', the default behaviour when
only one _row_ is left is equivalent to specifying 'drop = FALSE'.  To
drop from a data frame to a list, 'drop = FALSE' has to (sic)
specified explicitly."

I think the exception mentioned in the first sentence is the reason
for my confusion.

I also think the second sentence is wrong and should have 'TRUE'
instead of 'FALSE'.

While it is true that a data frame is a list, it is not a list of
numbers, but rather a list of columns, which, if I understand
correctly, can be either vectors or matrices. So regardless of the
value assigned to 'drop' the returned object is a list.

When I asked "why isn't sw[1, ] a list?" I should have asked instead
"why isn't sw[1, ] a list of vectors?"

I did some experiments with a data frame a, where the columns are
vectors (no matrix columns):

> is.data.frame(a) # just checking
[1] TRUE

> a1<- a[3, ]
> (is.data.frame(a1))
[1] TRUE                     (did not sop being a data frame)
> (is.list(a1))
[1] TRUE                     (but it is a list)

> a2<- a[3, , drop=T]
> (is.data.frame(a2))
[1] FALSE                   (no longer a data frame)
> (is.list(a2))
[1] TRUE                     (but it is a list)

> a3<- a[3, , drop=F]
> (is.data.frame(a3))
[1] TRUE                    (still a data frame)
> (is.list(a3)) 
[1] TRUE                    (but it is a list)

I also tried:

> a2[1]
$dates.num
[1] 477032400

> a3[1]
  dates.num
3 477032400  (notice the row name)

> attributes(a3[1])
$names
[1] "dates.num"

$class
[1] "data.frame"

$row.names
[1] "3"

> attributes(a2[1])
$names
[1] "dates.num"

FS

On 4/16/05, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
> On Sat, 16 Apr 2005, Prof Brian Ripley wrote:
> 
> > Perhaps Fernando will also note that is documented in ?"[.data.frame",
> > a slightly more appropriate reference than Bill's.
> >
> > It would be a good idea to read a good account of R's indexing: Bill Venables
> > and I know of a couple you will find in the R FAQ.
> 
> BTW,
> 
> sw <- swiss
> sw[1,,drop=TRUE] *is* a list (not as claimed, but as documented)
> sw[1, ]          is a data frame
> sw[, 1]          is a numeric vector.
> 
> I should have pointed out that "[.data.frame" is in the See Also of Bill's
> reference.
> 
> BTW to Andy: a list is a vector, and Kurt and I recently have been trying
> to correct documentation that means `atomic vector' when it says `vector'.
> (Long ago lists in R were pairlists and not vectors.)
> 
> > is.vector(list(a=1))
> [1] TRUE
> 
> 
> > On Sat, 16 Apr 2005, Liaw, Andy wrote:
> >
> >> Because a data frame can hold different data types (even matrices) in
> >> different variables, one row of it can not be converted to a vector in
> >> general (where all elements need to be of the same type).
> >>
> >> Andy
> >>
> >>> From: Fernando Saldanha
> >>>
> >>> Thanks, it's interesting reading.
> >>>
> >>> I also noticed that
> >>>
> >>> sw[, 1, drop = TRUE] is a vector (coerces to the lowest dimension)
> >>>
> >>> but
> >>>
> >>> sw[1, , drop = TRUE] is a one-row data frame (does not convert it into
> >>> a list or vector)
> >>>
> >>> FS
> >>>
> >>>
> >>> On 4/16/05, Bill.Venables at csiro.au <Bill.Venables at csiro.au> wrote:
> >>>> You should look at
> >>>>
> >>>>> ?"["
> >>>>
> >>>> and look very carefully at the "drop" argument.  For your example
> >>>>
> >>>>> sw[, 1]
> >>>>
> >>>> is the first component of the data frame, but
> >>>>
> >>>>> sw[, 1, drop = FALSE]
> >>>>
> >>>> is a data frame consisting of just the first component, as
> >>>> mathematically fastidious people would expect.
> >>>>
> >>>> This is a convention, and like most arbitrary conventions
> >>> it can be very
> >>>> useful most of the time, but some of the time it can be a very nasty
> >>>> trap.  Caveat emptor.
> >>>>
> >>>> Bill Venables.
> >>>>
> >>>> -----Original Message-----
> >>>> From: r-help-bounces at stat.math.ethz.ch
> >>>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of
> >>> Fernando Saldanha
> >>>> Sent: Saturday, 16 April 2005 1:07 PM
> >>>> To: Submissions to R help
> >>>> Subject: [R] Getting subsets of a data frame
> >>>>
> >>>> I was reading in the Reference Manual about Extract.data.frame.
> >>>>
> >>>> There is a list of examples of expressions using [ and [[, with the
> >>>> outcomes. I was puzzled by the fact that, if sw is a data
> >>> frame, then
> >>>>
> >>>> sw[, 1:3]
> >>>>
> >>>> is also a data frame,
> >>>>
> >>>> but
> >>>>
> >>>> sw[, 1]
> >>>>
> >>>> is just a vector.
> >>>>
> >>>> Since R has no scalars, it must be the case that 1 and 1:1
> >>> are the same:
> >>>>
> >>>>> 1 == 1:1
> >>>> [1] TRUE
> >>>>
> >>>> Then why isn't sw[,1] = sw[, 1:1] a data frame?
> >>>>
> >>>> FS
> >>>>
> >>>> ______________________________________________
> >>>> R-help at stat.math.ethz.ch mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> PLEASE do read the posting guide!
> >>>> http://www.R-project.org/posting-guide.html
> >>>>
> >>>
> >>> ______________________________________________
> >>> R-help at stat.math.ethz.ch mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide!
> >>> http://www.R-project.org/posting-guide.html
> >>>
> >>>
> >>>
> >>
> >> ______________________________________________
> >> R-help at stat.math.ethz.ch mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide!
> >> http://www.R-project.org/posting-guide.html
> >>
> >
> > --
> > Brian D. Ripley,                  ripley at stats.ox.ac.uk
> > Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> > University of Oxford,             Tel:  +44 1865 272861 (self)
> > 1 South Parks Road,                     +44 1865 272866 (PA)
> > Oxford OX1 3TG, UK                Fax:  +44 1865 272595
> >
> 
> --
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>




More information about the R-help mailing list