[R] subsetting and NAs

P Ehlers ehlers at math.ucalgary.ca
Mon Mar 20 20:06:34 CET 2006



Eric Archer wrote:
> R-help,
> 
> I'm getting some unexpected behavior with subsetting a data frame 
> (aircraft flight data) that I can't sort out.
> Here is a simplified version of my data frame and problem:
> 
>  > flight
>       FlightID TailNo FlightDate HobbsTime FlightCost       Date year
> 1         4497  6009K       <NA>       2.2      330.0       <NA>   NA
> 2         4498  6009K       <NA>       0.8      120.0       <NA>   NA
> 3         4499  6009K       <NA>       0.9      135.0       <NA>   NA
> 4         4500  6009K       <NA>       1.1      165.0       <NA>   NA
> 5         4501  6009K       <NA>       1.5      225.0       <NA>   NA
> 2587      7083  9206N   4/8/2009       1.5      103.5 2009-04-08 2009
> 2588      7084  9206N  4/10/2009       1.3       89.7 2009-04-10 2009
> 2589      7085  9206N  4/11/2009       1.9      131.1 2009-04-11 2009
> 2590      7086  9206N  4/12/2009       1.3       89.7 2009-04-12 2009
> 2591      7087  9206N  4/15/2009       1.1       75.9 2009-04-15 2009
> 29793    35208  91630  1/21/2006       1.4      107.8 2006-01-21 2006
> 29794    35209  91630  1/21/2006       0.7       53.9 2006-01-21 2006
> 29795    35210  9725B  1/21/2006       1.4      138.6 2006-01-21 2006
> 29796    35212  91630  1/28/2006       1.0       77.0 2006-01-28 2006
> 29797    35213  91630  1/28/2006       1.6      123.2 2006-01-28 2006
> 29798    35214  3386E   1/5/2006       1.1       86.9 2006-01-05 2006
> 
> I then try to extract the error years :
> 
>  > errors <- flight[flight$year > 2006,]
>  > errors
>      FlightID TailNo FlightDate HobbsTime FlightCost       Date year
> NA         NA   <NA>       <NA>        NA         NA       <NA>   NA
> NA.1       NA   <NA>       <NA>        NA         NA       <NA>   NA
> NA.2       NA   <NA>       <NA>        NA         NA       <NA>   NA
> NA.3       NA   <NA>       <NA>        NA         NA       <NA>   NA
> NA.4       NA   <NA>       <NA>        NA         NA       <NA>   NA
> 2587     7083  9206N   4/8/2009       1.5      103.5 2009-04-08 2009
> 2588     7084  9206N  4/10/2009       1.3       89.7 2009-04-10 2009
> 2589     7085  9206N  4/11/2009       1.9      131.1 2009-04-11 2009
> 2590     7086  9206N  4/12/2009       1.3       89.7 2009-04-12 2009
> 2591     7087  9206N  4/15/2009       1.1       75.9 2009-04-15 2009
> 
> Would someone please explain to me why the new data frame has all 
> columns (and row names) replaced with NA where year was NA and how to 
> avoid this behavior?.
> Thanks in advance.
> 
> I am using R v2.2.1 on Windows XP.
> 
> Cheers,
> eric

  [snip]

flight$year > 2006 will return TRUE/FALSE, not row numbers. Try this:

errors <- subset(flight, subset = year > 2006)

Peter Ehlers




More information about the R-help mailing list