[R] Selecting elements

Thu Aug 26 01:38:43 CEST 2021

Hi Silvano,
Just add the selected elements to the return value:

set.seed(123)
Var.1 <- rep(LETTERS[1:4], 10)
Var.2 <- sample(1:40, replace=FALSE)
data <- data.frame(Var.1, Var.2)
(Order <- data[order(data$Var.2, decreasing=TRUE), ])
allowed<-matrix(c(3,3,2,2,2,5,0,3,3,4,2,1),nrow=3,byrow=TRUE)
colnames(allowed)<-LETTERS[1:4]
select_largest<-function(x,allowed,n=10) {
 totals<-rep(0,nrow(allowed))
 indices<-matrix(0,ncol=n,nrow=nrow(allowed))
 for(i in 1:nrow(allowed)) {
  ii<-1
  for(j in 1:ncol(allowed)) {
   if(allowed[i,j]) {
    indx<-which(x[,1] == colnames(allowed)[j])
    totals[i]<-totals[i]+sum(x[indx[1:allowed[i,j]],2])
    indices[i,ii:(ii+allowed[i,j]-1)]<-indx[1:allowed[i,j]]
    ii<-ii+allowed[i,j]
   }
  }
 }
 largest<-which.max(totals)
 # sort the indices here
 indices<-sort(indices[largest,])
 return(list(scheme=largest,total=totals[largest],
  indices=indices,elements=x[indices,]))
}
select_largest(Order,allowed)

Jim

On Thu, Aug 26, 2021 at 12:46 AM Silvano Cesar da Costa <silvano using uel.br> wrote:
>
> Wow,
>
> That's exactly what I want. But, if possible, that a list was created with the selected elements (variable and value).
> Is it possible to add in the output file?
> Thank you very much.
>
> Prof. Dr. Silvano Cesar da Costa
> Universidade Estadual de Londrina
> Centro de Ciências Exatas
> Departamento de Estatística
>
> Fone: (43) 3371-4346
>
>
> Em qua., 25 de ago. de 2021 às 03:12, Jim Lemon <drjimlemon using gmail.com> escreveu:
>>
>> Hi Silvano,
>> I was completely stumped by your problem until I looked through Petr's
>> response and guessed that you wanted the largest sum of 'Var.1"
>> constrained by the specified numbers in your three schemes. I think
>> this is what you want, but I haven't checked it exhaustively.
>>
>> set.seed(123)
>> Var.1 <- rep(LETTERS[1:4], 10)
>> Var.2 <- sample(1:40, replace=FALSE)
>> data <- data.frame(Var.1, Var.2)
>> (Order <- data[order(data$Var.2, decreasing=TRUE), ])
>> allowed<-matrix(c(3,3,2,2,2,5,0,3,3,4,2,1),nrow=3,byrow=TRUE)
>> colnames(allowed)<-LETTERS[1:4]
>> select_largest<-function(x,allowed,n=10) {
>>  totals<-rep(0,nrow(allowed))
>>  indices<-matrix(0,ncol=n,nrow=nrow(allowed))
>>  for(i in 1:nrow(allowed)) {
>>   ii<-1
>>   for(j in 1:ncol(allowed)) {
>>    if(allowed[i,j]) {
>>     indx<-which(x[,1] == colnames(allowed)[j])
>>     totals[i]<-totals[i]+sum(x[indx[1:allowed[i,j]],2])
>>     indices[i,ii:(ii+allowed[i,j]-1)]<-indx[1:allowed[i,j]]
>>     ii<-ii+allowed[i,j]
>>    }
>>   }
>>  }
>>  largest<-which.max(totals)
>>  return(list(scheme=largest,total=totals[largest],
>>   indices=sort(indices[largest,])))
>> }
>> select_largest(Order,allowed)
>>
>> Jim
>>
>> On Tue, Aug 24, 2021 at 7:11 PM PIKAL Petr <petr.pikal using precheza.cz> wrote:
>> >
>> > Hi.
>> >
>> > Now it is understandable.  However the solution is not clear for me.
>> >
>> > table(Order$Var.1[1:10])
>> > A B C D
>> > 4 1 2 3
>> >
>> > should give you a hint which scheme could be acceptable, but how to do it programmatically I do not know.
>> >
>> > maybe to start with lower value in the table call and gradually increse it to check which scheme starts to be the chosen one
>> >
>> > > table(data.o$Var.1[1]) # scheme 2 is out
>> > C
>> > 1
>> > ...
>> > > table(data.o$Var.1[1:5]) #scheme 3
>> > A B C D
>> > 1 1 2 1
>> >
>> > > table(data.o$Var.1[1:6]) #scheme 3
>> >
>> > A B C D
>> > 2 1 2 1
>> >
>> > > table(data.o$Var.1[1:7]) # scheme1
>> > A B C D
>> > 2 1 2 2
>> >
>> > > table(data.o$Var.1[1:8]) # no such scheme, so scheme 1 is chosen one
>> > A B C D
>> > 2 1 2 3
>> >
>> > #Now you need to select values based on scheme 1.
>> > # 3A - 3B - 2C - 2D
>> >
>> > sss <- split(Order, Order$Var.1)
>> > selection <- c(3,3,2,2)
>> > result <- vector("list", 4)
>> >
>> > #I would use loop
>> >
>> > for(i in 1:4) {
>> > result[[i]] <- sss[[i]][1:selection[i],]
>> > }
>> >
>> > Maybe someone come with other ingenious solution.
>> >
>> > Cheers
>> > Petr
>> >
>> > From: Silvano Cesar da Costa <silvano using uel.br>
>> > Sent: Monday, August 23, 2021 7:54 PM
>> > To: PIKAL Petr <petr.pikal using precheza.cz>
>> > Cc: r-help using r-project.org
>> > Subject: Re: [R] Selecting elements
>> >
>> > Hi,
>> >
>> > I apologize for the confusion. I will try to be clearer in my explanation. I believe that with the R script it becomes clearer.
>> >
>> > I have 4 variables with 10 repetitions and each one receives a value, randomly.
>> > I order the dataset from largest to smallest value. I have to select 10 elements in
>> > descending order of values, according to one of three schemes:
>> >
>> > # 3A - 3B - 2C - 2D
>> > # 2A - 5B - 0C - 3D
>> > # 3A - 4B - 2C - 1D
>> >
>> > If the first 3 elements (out of the 10 to be selected) are of the letter D, automatically
>> > the adopted scheme will be the second. So, I have to (following) choose 2A, 5B and 0C.
>> > How to make the selection automatically?
>> >
>> > I created two selection examples, with different schemes:
>> >
>> >
>> >
>> > set.seed(123)
>> >
>> > Var.1 = rep(LETTERS[1:4], 10)
>> > Var.2 = sample(1:40, replace=FALSE)
>> >
>> > data = data.frame(Var.1, Var.2)
>> >
>> > (Order = data[order(data$Var.2, decreasing=TRUE), ])
>> >
>> > # I must select the 10 highest values (),
>> > # but which follow a certain scheme:
>> > #
>> > #  3A - 3B - 2C - 2D     or
>> > #  2A - 5B - 0C - 3D     or
>> > #  3A - 4B - 2C - 1D
>> > #
>> > # In this case, I started with the highest value that refers to the letter C.
>> > # Next comes only 1 of the letters B, A and D. All are selected once.
>> > # The fifth observation is the letter C, completing 2 C values. In this case,
>> > # following the 3 adopted schemes, note that the second scheme has 0C,
>> > # so this scheme is out.
>> > # Therefore, it can be the first scheme (3A - 3B - 2C - 2D) or the
>> > # third scheme (3A - 4B - 2C - 1D).
>> > # The next letter to be completed is the D (fourth and seventh elements),
>> > # among the 10 elements being selected. Therefore, the scheme adopted is the
>> > # first one (3A - 3B - 2C - 2D).
>> > # Therefore, it is necessary to select 2 values with the letter B and 1 value
>> > # with the letter A.
>> > #
>> > # Manual Selection -
>> > # The end result is:
>> > (Selected.data = Order[c(1,2,3,4,5,6,7,9,13,16), ])
>> >
>> > # Scheme: 3A - 3B - 2C - 2D
>> > sort(Selected.data$Var.1)
>> >
>> >
>> > #------------------
>> > # Second example: -
>> > #------------------
>> > set.seed(4)
>> >
>> > Var.1 = rep(LETTERS[1:4], 10)
>> > Var.2 = sample(1:40, replace=FALSE)
>> >
>> > data = data.frame(Var.1, Var.2)
>> > (Order = data[order(data$Var.2, decreasing=TRUE), ])
>> >
>> > # The end result is:
>> > (Selected.data.2 = Order[c(1,2,3,4,5,6,7,8,9,11), ])
>> >
>> > # Scheme: 3A - 4B - 2C - 1D
>> > sort(Selected.data.2$Var.1)
>> >
>> > How to make the selection of the 10 elements automatically?
>> >
>> > Thank you very much.
>> >
>> > Prof. Dr. Silvano Cesar da Costa
>> > Universidade Estadual de Londrina
>> > Centro de Ciências Exatas
>> > Departamento de Estatística
>> >
>> > Fone: (43) 3371-4346
>> >
>> >
>> > Em seg., 23 de ago. de 2021 às 05:05, PIKAL Petr <mailto:petr.pikal using precheza.cz> escreveu:
>> > Hi
>> >
>> > Only I got your HTML formated mail, rest of the world got complete mess. Do not use HTML formating.
>> >
>> > As I got it right I wonder why in your second example you did not follow
>> > 3A - 3B - 2C - 2D
>> >
>> > as D were positioned 1st and 4th.
>> >
>> > I hope that you could use something like
>> >
>> > sss <- split(data$Var.2, data$Var.1)
>> > lapply(sss, cumsum)
>> > $A
>> >  [1]  38  73 105 136 166 188 199 207 209 210
>> >
>> > $B
>> >  [1]  39  67  92 115 131 146 153 159 164 168
>> >
>> > $C
>> >  [1]  40  76 105 131 152 171 189 203 213 222
>> >
>> > $D
>> >  [1]  37  71 104 131 155 175 192 205 217 220
>> >
>> > Now you need to evaluate this result according to your sets. Here the highest value (76) is in C so the set with 2C is the one you should choose and select you value according to this set.
>> >
>> > With
>> > > set.seed(666)
>> > > Var.1 = rep(LETTERS[1:4], 10)
>> > > Var.2 = sample(1:40, replace=FALSE)
>> > > data = data.frame(Var.1, Var.2)
>> > > data <- data[order(data$Var.2, decreasing=TRUE), ]
>> > > sss <- split(data$Var.2, data$Var.1)
>> > > lapply(sss, cumsum)
>> > $A
>> >  [1]  36  70 102 133 163 182 200 207 212 213
>> >
>> > $B
>> >  [1]  35  57  78  95 108 120 131 140 148 150
>> >
>> > $C
>> >  [1]  40  73 102 130 156 180 196 211 221 225
>> >
>> > $D
>> >  [1]  39  77 114 141 166 189 209 223 229 232
>> >
>> > Highest value is in D so either 3A - 3B - 2C - 2D  or 3A - 3B - 2C - 2D should be appropriate. And here I am again lost as both sets are same. Maybe you need to reconsider your statements.
>> >
>> > Cheers
>> > Petr
>> >
>> > From: Silvano Cesar da Costa <mailto:silvano using uel.br>
>> > Sent: Friday, August 20, 2021 9:28 PM
>> > To: PIKAL Petr <mailto:petr.pikal using precheza.cz>
>> > Cc: mailto:r-help using r-project.org
>> > Subject: Re: [R] Selecting elements
>> >
>> > Hi, thanks you for the answer.
>> > Sorry English is not my native language.
>> >
>> > But you got it right.
>> > > As C is first and fourth biggest value, you follow third option and select 3 highest A, 3B 2C and 2D?
>> >
>> > I must select the 10 (not 15) highest values, but which follow a certain order:
>> > 3A - 3B - 2C - 2D     or
>> > 2A - 5B - 0C - 3D     or
>> > 3A - 3B - 2C - 2D
>> > I'll put the example in Excel for a better understanding (with 20 elements only).
>> > I must select 10 elements (the highest values of variable Var.2), which fit one of the 3 options above.
>> >
>> > Number
>> > Position
>> > Var.1
>> > Var.2
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > 1
>> > 27
>> > C
>> > 40
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > 2
>> > 30
>> > B
>> > 39
>> >
>> > Selected:
>> >
>> >
>> >
>> >
>> >
>> > 3
>> > 5
>> > A
>> > 38
>> >
>> > Number
>> > Position
>> > Var.1
>> > Var.2
>> >
>> >
>> >
>> > 4
>> > 16
>> > D
>> > 37
>> >
>> > 1
>> > 27
>> > C
>> > 40
>> >
>> >
>> >
>> > 5
>> > 23
>> > C
>> > 36
>> >
>> > 2
>> > 30
>> > B
>> > 39
>> >
>> > 3A - 3B - 2C - 2D
>> > 6
>> > 13
>> > A
>> > 35
>> >
>> > 3
>> > 5
>> > A
>> > 38
>> >
>> >
>> >
>> > 7
>> > 20
>> > D
>> > 34
>> >
>> > 4
>> > 16
>> > D
>> > 37
>> >
>> > 3A - 3B - 1C - 3D
>> > 8
>> > 12
>> > D
>> > 33
>> >
>> > 5
>> > 23
>> > C
>> > 36
>> >
>> >
>> >
>> > 9
>> > 9
>> > A
>> > 32
>> >
>> > 6
>> > 13
>> > A
>> > 35
>> >
>> > 2A - 5B - 0C - 3D
>> > 10
>> > 1
>> > A
>> > 31
>> >
>> > 7
>> > 20
>> > D
>> > 34
>> >
>> >
>> >
>> > 11
>> > 21
>> > A
>> > 30
>> >
>> > 10
>> > 9
>> > A
>> > 32
>> >
>> >
>> >
>> > 12
>> > 35
>> > C
>> > 29
>> >
>> > 13
>> > 14
>> > B
>> > 28
>> >
>> >
>> >
>> > 13
>> > 14
>> > B
>> > 28
>> >
>> > 17
>> > 6
>> > B
>> > 25
>> >
>> >
>> >
>> > 14
>> > 8
>> > D
>> > 27
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > 15
>> > 7
>> > C
>> > 26
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > 16
>> > 6
>> > B
>> > 25
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > 17
>> > 40
>> > D
>> > 24
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > 18
>> > 26
>> > B
>> > 23
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > 19
>> > 29
>> > A
>> > 22
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > 20
>> > 31
>> > C
>> > 21
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > Second option (other data set):
>> >
>> > Number
>> > Position
>> > Var.1
>> > Var.2
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > 1
>> > 36
>> > D
>> > 20
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > 2
>> > 11
>> > B
>> > 19
>> >
>> > Selected:
>> >
>> >
>> >
>> >
>> >
>> > 3
>> > 39
>> > A
>> > 18
>> >
>> > Number
>> > Position
>> > Var.1
>> > Var.2
>> >
>> >
>> >
>> > 4
>> > 24
>> > D
>> > 17
>> >
>> > 1
>> > 36
>> > D
>> > 20
>> >
>> >
>> >
>> > 5
>> > 34
>> > B
>> > 16
>> >
>> > 2
>> > 11
>> > B
>> > 19
>> >
>> > 3A - 3B - 2C - 2D
>> > 6
>> > 2
>> > B
>> > 15
>> >
>> > 3
>> > 39
>> > A
>> > 18
>> >
>> >
>> >
>> > 7
>> > 3
>> > A
>> > 14
>> >
>> > 4
>> > 24
>> > D
>> > 17
>> >
>> > 3A - 3B - 1C - 3D
>> > 8
>> > 32
>> > D
>> > 13
>> >
>> > 5
>> > 34
>> > B
>> > 16
>> >
>> >
>> >
>> > 9
>> > 28
>> > D
>> > 12
>> >
>> > 6
>> > 2
>> > B
>> > 15
>> >
>> > 2A - 5B - 0C - 3D
>> > 10
>> > 25
>> > A
>> > 11
>> >
>> > 7
>> > 3
>> > A
>> > 14
>> >
>> >
>> >
>> > 11
>> > 19
>> > B
>> > 10
>> >
>> > 8
>> > 32
>> > D
>> > 13
>> >
>> >
>> >
>> > 12
>> > 15
>> > B
>> > 9
>> >
>> > 9
>> > 25
>> > A
>> > 11
>> >
>> >
>> >
>> > 13
>> > 17
>> > A
>> > 8
>> >
>> > 10
>> > 18
>> > C
>> > 7
>> >
>> >
>> >
>> > 14
>> > 18
>> > C
>> > 7
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > 15
>> > 38
>> > B
>> > 6
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > 16
>> > 10
>> > B
>> > 5
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > 17
>> > 22
>> > B
>> > 4
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > 18
>> > 4
>> > D
>> > 3
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > 19
>> > 33
>> > A
>> > 2
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > 20
>> > 37
>> > A
>> > 1
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > How to make the selection of these 10 elements that fit one of the 3 options using R?
>> >
>> > Thanks,
>> >
>> > Prof. Dr. Silvano Cesar da Costa
>> > Universidade Estadual de Londrina
>> > Centro de Ciências Exatas
>> > Departamento de Estatística
>> >
>> > Fone: (43) 3371-4346
>> >
>> >
>> > Em sex., 20 de ago. de 2021 às 03:28, PIKAL Petr <mailto:mailto:petr.pikal using precheza.cz> escreveu:
>> > Hallo
>> >
>> > I am confused, maybe others know what do you want but could you be more specific?
>> >
>> > Let say you have such data
>> > set.seed(123)
>> > Var.1 = rep(LETTERS[1:4], 10)
>> > Var.2 = sample(1:40, replace=FALSE)
>> > data = data.frame(Var.1, Var.2)
>> >
>> > What should be the desired outcome?
>> >
>> > You can sort
>> > data <- data[order(data$Var.2, decreasing=TRUE), ]
>> > and split the data
>> > > split(data$Var.2, data$Var.1)
>> > $A
>> >  [1] 38 35 32 31 30 22 11  8  2  1
>> >
>> > $B
>> >  [1] 39 28 25 23 16 15  7  6  5  4
>> >
>> > $C
>> >  [1] 40 36 29 26 21 19 18 14 10  9
>> >
>> > $D
>> >  [1] 37 34 33 27 24 20 17 13 12  3
>> >
>> > T inspect highest values. But here I am lost. As C is first and fourth biggest value, you follow third option and select 3 highest A, 3B 2C and 2D?
>> >
>> > Or I do not understand at all what you really want to achieve.
>> >
>> > Cheers
>> > Petr
>> >
>> > > -----Original Message-----
>> > > From: R-help <mailto:mailto:r-help-bounces using r-project.org> On Behalf Of Silvano Cesar da
>> > > Costa
>> > > Sent: Thursday, August 19, 2021 10:40 PM
>> > > To: mailto:mailto:r-help using r-project.org
>> > > Subject: [R] Selecting elements
>> > >
>> > > Hi,
>> > >
>> > > I need to select 15 elements, always considering the highest values
>> > > (descending order) but obeying the following configuration:
>> > >
>> > > 3A - 4B - 0C - 3D or
>> > > 2A - 5B - 0C - 3D or
>> > > 3A - 3B - 2C - 2D
>> > >
>> > > If I have, for example, 5 A elements as the highest values, I can only choose
>> > > (first and third choice) or 2 (second choice) elements.
>> > >
>> > > how to make this selection?
>> > >
>> > >
>> > > library(dplyr)
>> > >
>> > > Var.1 = rep(LETTERS[1:4], 10)
>> > > Var.2 = sample(1:40, replace=FALSE)
>> > >
>> > > data = data.frame(Var.1, Var.2)
>> > > (data = data[order(data$Var.2, decreasing=TRUE), ])
>> > >
>> > > Elements = data %>%
>> > >   arrange(desc(Var.2))
>> > >
>> > > Thanks,
>> > >
>> > > Prof. Dr. Silvano Cesar da Costa
>> > > Universidade Estadual de Londrina
>> > > Centro de Ciências Exatas
>> > > Departamento de Estatística
>> > >
>> > > Fone: (43) 3371-4346
>> > >
>> > >       [[alternative HTML version deleted]]
>> > >
>> > > ______________________________________________
>> > > mailto:mailto:R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > > https://stat.ethz.ch/mailman/listinfo/r-help
>> > > PLEASE do read the posting guide http://www.R-project.org/posting-
>> > > guide.html
>> > > and provide commented, minimal, self-contained, reproducible code.
>> > ______________________________________________
>> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.