[R] subset data frame with condition

Petr Savicky savicky at praha1.ff.cuni.cz
Fri Mar 18 19:19:58 CET 2011


On Fri, Mar 18, 2011 at 10:48:44AM -0700, Nicolas Gutierrez wrote:
> Hello,
> 
> One more question.. I have the data.frame "pop":
> 
>     xloc yloc  gonad  ind    Ene    W   Area
> 1    23  20   516.74   1     0.02 20.21  1
> 2    23  20  1143.20   1     0.02 20.21  1
> 3    23  20   250.00   1     0.02 20.21  1
> 4    22  15   251.98   1     0.02 18.69  2
> 5    22  15   598.08   1     0.02 18.69  2
> 6    21  19   250.00   1     0.02 20.21  3
> 7    22  20   251.98   1     0.02 18.69  4
> 8    22  20   598.08   1     0.02 18.69  4
> 
> and I need to extract 50% (or rounded) of the rows for each Area (from 
> Area 1 to 3 only):
> 
>     xloc yloc  gonad  ind    Ene    W   Area
> 1    23  20   516.74   1     0.02 20.21  1
> 2    23  20  1143.20   1     0.02 20.21  1
> 4    22  15   251.98   1     0.02 18.69  2
> 6    21  19   250.00   1     0.02 20.21  3
> 
> I did this within a loop, but considering my data.frame has more than 
> 10,000 rows and within other loops it makes my code run forever! Any 
> hints? Thanks!!

Hello.

Let me use a data frame with one column only, but more rows. The
following code contains a cycle over 1:3, but otherwise is vectorized.

  pop <- data.frame(Area=c(1,1,1,1,1,2,2,2,2,3,3,3,4,4))
  final <- rep(FALSE, times=nrow(pop))
  for (k in 1:3) {
      is.k <- pop$Area == k
      accept <- is.k & (cumsum(is.k) <= ceiling(sum(is.k)/2))
      final <- final | accept
  }
  pop[final, , drop=FALSE] # "drop=" not needed, if there are more columns

Hope this helps.

Petr Savicky.



More information about the R-help mailing list