[R] Sorting a Data Frame

Bert Gunter bgunter.4567 at gmail.com
Wed Jan 27 04:53:53 CET 2016


...

> mydf[2]   # ???
  B
1 4
2 5
3 6

A data frame is "really" a list of columns, so giving a single value
returns that column.

False. It returns a data frame consisting of a single column = a list
containing a single component.

mydf[[2]]
  returns a single component/column.

While these differences may seem subtle, they are essential (I have
certainly suffered bad consequences when I have been careless about
them).

The OP should study ?"[" -- or if that is too dense (it is pretty
dense!)  a suitable R tutorial. Indexing is fundamental to effective
use of R and anyone who needs to make effective use of the language
needs to put in the time to learn. I would say that this is the case
even those who prefer to use the tools provided by Hadley Wickham's
plyR packages or similar tools that may exist in others (e.g.
data.table).

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Jan 26, 2016 at 1:35 PM, Sarah Goslee <sarah.goslee at gmail.com> wrote:
> On Tue, Jan 26, 2016 at 4:24 PM, Robert Sherry <rsherry8 at comcast.net> wrote:
>>
>> Thank  you for the response. As expected, the following expression worked:
>>     df[order(df$x),]
>
> This says to sort the rows, and leave the columns alone.
> Subsetting a 2-dimensional object is via
> [rows, columns]
>
>> I would expect the following expression to work also:
>>         df[order(df$x)]
>
> This does something a bit unexpected, and what it does depends on
> whether you have a data frame or matrix.
>
>
>> mydf <- data.frame(A=1:3, B=4:6)
>
>> mydf[2, ] # row 2
>   A B
> 2 2 5
>
>> mydf[, 2] # col 2
> [1] 4 5 6
>
>> mydf[2]   # ???
>   B
> 1 4
> 2 5
> 3 6
>
> A data frame is "really" a list of columns, so giving a single value
> returns that column.
>
>
>> mymat <- as.matrix(mydf)
>> mymat[2, ] # row 2
> A B
> 2 5
>
>> mymat[, 2] # col 2
> [1] 4 5 6
>
>> mymat[2]   # ???
> [1] 2
>
> But for a matrix, it returns that element, starting at the top left
> and working down rows first.
>
> So it's a really good idea to not subset your rectangular objects that
> way, as it may eventually bite you.
>
>
>> However it does not. That is, the comma is needed. Please tell me why the
>> comma is there.
>>
>> Thanks
>> Bob
>> On 1/26/2016 8:19 AM, S Ellison wrote:
>>>>
>>>> On 23.01.2016 01:21, Robert Sherry wrote:
>>>>>
>>>>> In R, I run the following commands:
>>>>>       df = data.frame( x=runif(10), y=runif(10) )
>>>>>       df2 = df[order(x),]
>>>>
>>>> You use another x from your workspace, you actually want to
>>>>
>>>>
>>>>    df2 = df[order(df[,"x"]),]
>>>
>>> or
>>> df[order(df$x),]
>>>
>>> And just to prevent yet more confusion, you might also want to avoid 'df'
>>> as a name. 'df' is the function that returns the density of the F
>>> distribution ...
>>>
>>> S Ellison
>>>
>>>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list