[R] Basic question for subset of dataframe

MacQueen, Don macqueen1 at llnl.gov
Thu Feb 27 16:45:32 CET 2014


Try a simpler example:

> ick <- data.frame(x=1:5, a=letters[1:5], c=month.abb[1:5], y=11:15)
> ick
  x a   c  y
1 1 a Jan 11
2 2 b Feb 12
3 3 c Mar 13
4 4 d Apr 14
5 5 e May 15
> ick[2]
  a
1 a
2 b
3 c
4 d
5 e
> 
> ick[3]
    c
1 Jan
2 Feb
3 Mar
4 Apr
5 May

If you use [] without a comma, it returns the specified columns.

  ick[ c(FALSE,TRUE,TRUE,FALSE) ]

will return the second and third columns, those where the logical vector
is TRUE.

This is because data frames are actually lists in disguise

> is.list(ick)  [1] TRUE


-Don

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 2/27/14 7:00 AM, "Kapil Shukla" <shukla.kapil at gmail.com> wrote:

>All - firstly apology if this is a very basic question but i tried myself
>and could not find a satisfied answer.
>
>I know that i can subset a dataframe using dataframe[row,column] and if i
>give dataframe[row,] that specific row is provided and similarly i can do
>dataframe[,column] to get the entire column.
>
>what i don't understand is that if i do dataframe[<conditional
>expression>]and don't provide the 'comma' what is being returned
>
>e.g. i have the below code:
>
>manager <- c(1, 2, 3, 4, 5)
>date <- c("10/24/08", "10/28/08", "10/1/08", "10/12/08", "5/1/09")
>country <- c("US", "US", "UK", "UK", "UK")
>gender <- c("M", "F", "F", "M", "F")
>age <- c(32, 45, 25, 39, 99)
>q1 <- c(5, 3, 3, 3, 2)
>q2 <- c(4, 5, 5, 3, 2)
>q3 <- c(5, 2, 5, 4, 1)
>q4 <- c(5, 5, 5, NA, 2)
>q5 <- c(5, 5, 2, NA, 1)
>leadership <- data.frame(manager, date, country, gender, age, q1, q2, q3,
>q4, q5, stringsAsFactors=FALSE)
>
>now if i do
>
>
>leadership[leadership$country == "US",]
>
>two row are being returned as
>
>
>
>  managerID JoinDate country gender age q1 q2 q3 q4 q5 agecat
>1         1 10/24/08      US      M  32  5  4  5  5  5  Young
>2         2 10/28/08      US      F  45  3  5  2  5  5  Young
>
>
>but if i do
>
>leadership[leadership$country == "US"] to get the entire data frame
>where country is US i am getting below
>
>
>  managerID JoinDate q1 q2 agecat
>1         1 10/24/08  5  4  Young
>2         2 10/28/08  3  5  Young
>3         3  10/1/08  3  5  Young
>4         4 10/12/08  3  3  Young
>5         5   5/1/09  2  2   <NA>
>
>
>
>Please guide me what am i doing wrong.
>
>
>Thanks
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list