[R] select rows by criteria

Rui Barradas rui1174 at sapo.pt
Thu Mar 1 19:07:08 CET 2012


Hello, again.


Petr Savicky wrote
> 
> On Thu, Mar 01, 2012 at 05:42:48PM +0100, Petr Savicky wrote:
>> On Thu, Mar 01, 2012 at 04:27:45AM -0800, syrvn wrote:
>> > Hello,
>> > 
>> > I am stuck with selecting the right rows from a data frame. I think the
>> > problem is rather how to select them
>> > then how to implement the R code.
>> > 
>> > Consider the following data frame:
>> > 
>> > df <- data.frame(ID = c(1,2,3,4,5,6,7,8,9,10), value =
>> > c(34,12,23,25,34,42,48,29,30,27))
>> > 
>> > What I want to achieve is to select 7 rows (values) so that the mean
>> value
>> > of those rows are closest
>> > to the value of 35 and the remaining 3 rows (values) are closest to 45.
>> > However, each value is only
>> > allowed to be sampled once!
>> 
>> Hi.
>> 
>> If some 3 rows have mean close to 45, then they have sum close
>> to 3*45, so the remaining 7 rows have sum close to
>> 
>>   sum(df$value) - 3*45 # [1] 169
>> 
>> and they have mean close to 169/7 = 24.14286. In other words,
>> the two criteria cannot be optimized together.
>> 
>> For this reason, let me choose the criterion on 3 rows.
>> The closest solution may be found as follows.
>> 
>>   # generate all triples and compute their means
>>   tripleMeans <- colMeans(combn(df$value, 3))
>> 
>>   # select the index of the triple with mean closest to 35
>>   indClosest <- which.min(abs(tripleMeans - 35))
> 
> I am sorry. There should be 45 and not 35.
> 
>   indClosest <- which.min(abs(tripleMeans - 45))
> 
>   # generate the indices, which form the closest triple in df$value
>   tripleInd <- combn(1:length(df$value), 3)[, indClosest]
>   tripleInd # [1] 1 6 7
> 
>   # check the mean of the triple
>   mean(df$value[tripleInd]) # [1] 41.33333
> 
> Petr Savicky.
> 
> ______________________________________________
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

There are two solutions for the 3 rows criterion, 'which.min' only finds
one, the first in the order given by 'combn'.
(And I've corrected my first post but still with an error)

# Forgot to change the index matrix
meansDist2 <- apply(inxmat2, 2, function(jnx) f(jnx, DF$value, 45))

# Two solutions
(i2 <- which(meansDist2 == min(meansDist2)))
inxmat2[, i2]

mean(DF$value[inxmat2[, i2][, 1]])
[1] 41.33333

Petr's solution and mine give the same mean value.
But use for small values of (n, k) only.

Rui Barradas



--
View this message in context: http://r.789695.n4.nabble.com/select-rows-by-criteria-tp4434812p4435760.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list