[R] Basic question for subset of dataframe

David Carlson dcarlson at tamu.edu
Thu Feb 27 18:34:41 CET 2014


You have discovered two features of R with your example. Don
told you about the first. Data frames are considered to be lists
so if you provide only one index, you get the columns (the list
elements) when you type

> str(leadership)
'data.frame':   5 obs. of  10 variables:
 $ manager: num  1 2 3 4 5
 $ date   : chr  "10/24/08" "10/28/08" "10/1/08" "10/12/08" ...
 $ country: chr  "US" "US" "UK" "UK" ...
 $ gender : chr  "M" "F" "F" "M" ...
 $ age    : num  32 45 25 39 99
 $ q1     : num  5 3 3 3 2
 $ q2     : num  4 5 5 3 2
 $ q3     : num  5 2 5 4 1
 $ q4     : num  5 5 5 NA 2
 $ q5     : num  5 5 2 NA 1

The second is that when you give R less than it is expecting, it
often recycles what you gave it. You gave it a logical vector of
five values:

> leadership$country == "US"
[1]  TRUE  TRUE FALSE FALSE FALSE

But there are 10 list elements so R recycled your vector to make
it equal to the number of variables. As a result you got
variables 1 and 2, skipped the next three, then 6 and 7, and
skipped the last three.

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of Ivan Calandra
Sent: Thursday, February 27, 2014 9:46 AM
To: r-help at r-project.org
Subject: Re: [R] Basic question for subset of dataframe

Hi,

Thanks for the example!

I cannot really tell you why you get what you get when you type 
leadership[leadership$country == "US"]

But what I know (or think I know) is that when you don't write
the 
comma, R will take it as a condition for the columns.
It means that leadership[1:2] is identical to leadership[,1:2]
identical(leadership[1:2],leadership[,1:2])
[1] TRUE

If you want all rows where "US" is present in "country", then
you did it 
fine using leadership[leadership$country == "US", ]

HTH,
Ivan

--
Ivan Calandra, ATER
Université de Franche-Comté
UFR STGI - UMR 6249 Chrono-Environnement
4 Place Tharradin - BP 71427
25211 Montbéliard Cedex, FRANCE
ivan.calandra at univ-fcomte.fr
http://biogeosciences.u-bourgogne.fr/calandra

Le 27/02/14 16:00, Kapil Shukla a écrit :
> All - firstly apology if this is a very basic question but i
tried myself
> and could not find a satisfied answer.
>
> I know that i can subset a dataframe using
dataframe[row,column] and if i
> give dataframe[row,] that specific row is provided and
similarly i can do
> dataframe[,column] to get the entire column.
>
> what i don't understand is that if i do dataframe[<conditional
> expression>]and don't provide the 'comma' what is being
returned
>
> e.g. i have the below code:
>
> manager <- c(1, 2, 3, 4, 5)
> date <- c("10/24/08", "10/28/08", "10/1/08", "10/12/08",
"5/1/09")
> country <- c("US", "US", "UK", "UK", "UK")
> gender <- c("M", "F", "F", "M", "F")
> age <- c(32, 45, 25, 39, 99)
> q1 <- c(5, 3, 3, 3, 2)
> q2 <- c(4, 5, 5, 3, 2)
> q3 <- c(5, 2, 5, 4, 1)
> q4 <- c(5, 5, 5, NA, 2)
> q5 <- c(5, 5, 2, NA, 1)
> leadership <- data.frame(manager, date, country, gender, age,
q1, q2, q3,
> q4, q5, stringsAsFactors=FALSE)
>
> now if i do
>
>
> leadership[leadership$country == "US",]
>
> two row are being returned as
>
>
>
>    managerID JoinDate country gender age q1 q2 q3 q4 q5 agecat
> 1         1 10/24/08      US      M  32  5  4  5  5  5  Young
> 2         2 10/28/08      US      F  45  3  5  2  5  5  Young
>
>
> but if i do
>
> leadership[leadership$country == "US"] to get the entire data
frame
> where country is US i am getting below
>
>
>    managerID JoinDate q1 q2 agecat
> 1         1 10/24/08  5  4  Young
> 2         2 10/28/08  3  5  Young
> 3         3  10/1/08  3  5  Young
> 4         4 10/12/08  3  3  Young
> 5         5   5/1/09  2  2   <NA>
>
>
>
> Please guide me what am i doing wrong.
>
>
> Thanks
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible
code.
>

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.




More information about the R-help mailing list