[R] Selecting elements

Wed Aug 25 08:12:35 CEST 2021

Hi Silvano,
I was completely stumped by your problem until I looked through Petr's
response and guessed that you wanted the largest sum of 'Var.1"
constrained by the specified numbers in your three schemes. I think
this is what you want, but I haven't checked it exhaustively.

set.seed(123)
Var.1 <- rep(LETTERS[1:4], 10)
Var.2 <- sample(1:40, replace=FALSE)
data <- data.frame(Var.1, Var.2)
(Order <- data[order(data$Var.2, decreasing=TRUE), ])
allowed<-matrix(c(3,3,2,2,2,5,0,3,3,4,2,1),nrow=3,byrow=TRUE)
colnames(allowed)<-LETTERS[1:4]
select_largest<-function(x,allowed,n=10) {
 totals<-rep(0,nrow(allowed))
 indices<-matrix(0,ncol=n,nrow=nrow(allowed))
 for(i in 1:nrow(allowed)) {
  ii<-1
  for(j in 1:ncol(allowed)) {
   if(allowed[i,j]) {
    indx<-which(x[,1] == colnames(allowed)[j])
    totals[i]<-totals[i]+sum(x[indx[1:allowed[i,j]],2])
    indices[i,ii:(ii+allowed[i,j]-1)]<-indx[1:allowed[i,j]]
    ii<-ii+allowed[i,j]
   }
  }
 }
 largest<-which.max(totals)
 return(list(scheme=largest,total=totals[largest],
  indices=sort(indices[largest,])))
}
select_largest(Order,allowed)

Jim

On Tue, Aug 24, 2021 at 7:11 PM PIKAL Petr <petr.pikal using precheza.cz> wrote:
>
> Hi.
>
> Now it is understandable.  However the solution is not clear for me.
>
> table(Order$Var.1[1:10])
> A B C D
> 4 1 2 3
>
> should give you a hint which scheme could be acceptable, but how to do it programmatically I do not know.
>
> maybe to start with lower value in the table call and gradually increse it to check which scheme starts to be the chosen one
>
> > table(data.o$Var.1[1]) # scheme 2 is out
> C
> 1
> ...
> > table(data.o$Var.1[1:5]) #scheme 3
> A B C D
> 1 1 2 1
>
> > table(data.o$Var.1[1:6]) #scheme 3
>
> A B C D
> 2 1 2 1
>
> > table(data.o$Var.1[1:7]) # scheme1
> A B C D
> 2 1 2 2
>
> > table(data.o$Var.1[1:8]) # no such scheme, so scheme 1 is chosen one
> A B C D
> 2 1 2 3
>
> #Now you need to select values based on scheme 1.
> # 3A - 3B - 2C - 2D
>
> sss <- split(Order, Order$Var.1)
> selection <- c(3,3,2,2)
> result <- vector("list", 4)
>
> #I would use loop
>
> for(i in 1:4) {
> result[[i]] <- sss[[i]][1:selection[i],]
> }
>
> Maybe someone come with other ingenious solution.
>
> Cheers
> Petr
>
> From: Silvano Cesar da Costa <silvano using uel.br>
> Sent: Monday, August 23, 2021 7:54 PM
> To: PIKAL Petr <petr.pikal using precheza.cz>
> Cc: r-help using r-project.org
> Subject: Re: [R] Selecting elements
>
> Hi,
>
> I apologize for the confusion. I will try to be clearer in my explanation. I believe that with the R script it becomes clearer.
>
> I have 4 variables with 10 repetitions and each one receives a value, randomly.
> I order the dataset from largest to smallest value. I have to select 10 elements in
> descending order of values, according to one of three schemes:
>
> # 3A - 3B - 2C - 2D
> # 2A - 5B - 0C - 3D
> # 3A - 4B - 2C - 1D
>
> If the first 3 elements (out of the 10 to be selected) are of the letter D, automatically
> the adopted scheme will be the second. So, I have to (following) choose 2A, 5B and 0C.
> How to make the selection automatically?
>
> I created two selection examples, with different schemes:
>
>
>
> set.seed(123)
>
> Var.1 = rep(LETTERS[1:4], 10)
> Var.2 = sample(1:40, replace=FALSE)
>
> data = data.frame(Var.1, Var.2)
>
> (Order = data[order(data$Var.2, decreasing=TRUE), ])
>
> # I must select the 10 highest values (),
> # but which follow a certain scheme:
> #
> #  3A - 3B - 2C - 2D     or
> #  2A - 5B - 0C - 3D     or
> #  3A - 4B - 2C - 1D
> #
> # In this case, I started with the highest value that refers to the letter C.
> # Next comes only 1 of the letters B, A and D. All are selected once.
> # The fifth observation is the letter C, completing 2 C values. In this case,
> # following the 3 adopted schemes, note that the second scheme has 0C,
> # so this scheme is out.
> # Therefore, it can be the first scheme (3A - 3B - 2C - 2D) or the
> # third scheme (3A - 4B - 2C - 1D).
> # The next letter to be completed is the D (fourth and seventh elements),
> # among the 10 elements being selected. Therefore, the scheme adopted is the
> # first one (3A - 3B - 2C - 2D).
> # Therefore, it is necessary to select 2 values with the letter B and 1 value
> # with the letter A.
> #
> # Manual Selection -
> # The end result is:
> (Selected.data = Order[c(1,2,3,4,5,6,7,9,13,16), ])
>
> # Scheme: 3A - 3B - 2C - 2D
> sort(Selected.data$Var.1)
>
>
> #------------------
> # Second example: -
> #------------------
> set.seed(4)
>
> Var.1 = rep(LETTERS[1:4], 10)
> Var.2 = sample(1:40, replace=FALSE)
>
> data = data.frame(Var.1, Var.2)
> (Order = data[order(data$Var.2, decreasing=TRUE), ])
>
> # The end result is:
> (Selected.data.2 = Order[c(1,2,3,4,5,6,7,8,9,11), ])
>
> # Scheme: 3A - 4B - 2C - 1D
> sort(Selected.data.2$Var.1)
>
> How to make the selection of the 10 elements automatically?
>
> Thank you very much.
>
> Prof. Dr. Silvano Cesar da Costa
> Universidade Estadual de Londrina
> Centro de Ciências Exatas
> Departamento de Estatística
>
> Fone: (43) 3371-4346
>
>
> Em seg., 23 de ago. de 2021 às 05:05, PIKAL Petr <mailto:petr.pikal using precheza.cz> escreveu:
> Hi
>
> Only I got your HTML formated mail, rest of the world got complete mess. Do not use HTML formating.
>
> As I got it right I wonder why in your second example you did not follow
> 3A - 3B - 2C - 2D
>
> as D were positioned 1st and 4th.
>
> I hope that you could use something like
>
> sss <- split(data$Var.2, data$Var.1)
> lapply(sss, cumsum)
> $A
>  [1]  38  73 105 136 166 188 199 207 209 210
>
> $B
>  [1]  39  67  92 115 131 146 153 159 164 168
>
> $C
>  [1]  40  76 105 131 152 171 189 203 213 222
>
> $D
>  [1]  37  71 104 131 155 175 192 205 217 220
>
> Now you need to evaluate this result according to your sets. Here the highest value (76) is in C so the set with 2C is the one you should choose and select you value according to this set.
>
> With
> > set.seed(666)
> > Var.1 = rep(LETTERS[1:4], 10)
> > Var.2 = sample(1:40, replace=FALSE)
> > data = data.frame(Var.1, Var.2)
> > data <- data[order(data$Var.2, decreasing=TRUE), ]
> > sss <- split(data$Var.2, data$Var.1)
> > lapply(sss, cumsum)
> $A
>  [1]  36  70 102 133 163 182 200 207 212 213
>
> $B
>  [1]  35  57  78  95 108 120 131 140 148 150
>
> $C
>  [1]  40  73 102 130 156 180 196 211 221 225
>
> $D
>  [1]  39  77 114 141 166 189 209 223 229 232
>
> Highest value is in D so either 3A - 3B - 2C - 2D  or 3A - 3B - 2C - 2D should be appropriate. And here I am again lost as both sets are same. Maybe you need to reconsider your statements.
>
> Cheers
> Petr
>
> From: Silvano Cesar da Costa <mailto:silvano using uel.br>
> Sent: Friday, August 20, 2021 9:28 PM
> To: PIKAL Petr <mailto:petr.pikal using precheza.cz>
> Cc: mailto:r-help using r-project.org
> Subject: Re: [R] Selecting elements
>
> Hi, thanks you for the answer.
> Sorry English is not my native language.
>
> But you got it right.
> > As C is first and fourth biggest value, you follow third option and select 3 highest A, 3B 2C and 2D?
>
> I must select the 10 (not 15) highest values, but which follow a certain order:
> 3A - 3B - 2C - 2D     or
> 2A - 5B - 0C - 3D     or
> 3A - 3B - 2C - 2D
> I'll put the example in Excel for a better understanding (with 20 elements only).
> I must select 10 elements (the highest values of variable Var.2), which fit one of the 3 options above.
>
> Number
> Position
> Var.1
> Var.2
>
>
>
>
>
>
>
>
> 1
> 27
> C
> 40
>
>
>
>
>
>
>
>
> 2
> 30
> B
> 39
>
> Selected:
>
>
>
>
>
> 3
> 5
> A
> 38
>
> Number
> Position
> Var.1
> Var.2
>
>
>
> 4
> 16
> D
> 37
>
> 1
> 27
> C
> 40
>
>
>
> 5
> 23
> C
> 36
>
> 2
> 30
> B
> 39
>
> 3A - 3B - 2C - 2D
> 6
> 13
> A
> 35
>
> 3
> 5
> A
> 38
>
>
>
> 7
> 20
> D
> 34
>
> 4
> 16
> D
> 37
>
> 3A - 3B - 1C - 3D
> 8
> 12
> D
> 33
>
> 5
> 23
> C
> 36
>
>
>
> 9
> 9
> A
> 32
>
> 6
> 13
> A
> 35
>
> 2A - 5B - 0C - 3D
> 10
> 1
> A
> 31
>
> 7
> 20
> D
> 34
>
>
>
> 11
> 21
> A
> 30
>
> 10
> 9
> A
> 32
>
>
>
> 12
> 35
> C
> 29
>
> 13
> 14
> B
> 28
>
>
>
> 13
> 14
> B
> 28
>
> 17
> 6
> B
> 25
>
>
>
> 14
> 8
> D
> 27
>
>
>
>
>
>
>
>
> 15
> 7
> C
> 26
>
>
>
>
>
>
>
>
> 16
> 6
> B
> 25
>
>
>
>
>
>
>
>
> 17
> 40
> D
> 24
>
>
>
>
>
>
>
>
> 18
> 26
> B
> 23
>
>
>
>
>
>
>
>
> 19
> 29
> A
> 22
>
>
>
>
>
>
>
>
> 20
> 31
> C
> 21
>
>
>
>
>
>
>
>
>
>
>
> Second option (other data set):
>
> Number
> Position
> Var.1
> Var.2
>
>
>
>
>
>
>
>
> 1
> 36
> D
> 20
>
>
>
>
>
>
>
>
> 2
> 11
> B
> 19
>
> Selected:
>
>
>
>
>
> 3
> 39
> A
> 18
>
> Number
> Position
> Var.1
> Var.2
>
>
>
> 4
> 24
> D
> 17
>
> 1
> 36
> D
> 20
>
>
>
> 5
> 34
> B
> 16
>
> 2
> 11
> B
> 19
>
> 3A - 3B - 2C - 2D
> 6
> 2
> B
> 15
>
> 3
> 39
> A
> 18
>
>
>
> 7
> 3
> A
> 14
>
> 4
> 24
> D
> 17
>
> 3A - 3B - 1C - 3D
> 8
> 32
> D
> 13
>
> 5
> 34
> B
> 16
>
>
>
> 9
> 28
> D
> 12
>
> 6
> 2
> B
> 15
>
> 2A - 5B - 0C - 3D
> 10
> 25
> A
> 11
>
> 7
> 3
> A
> 14
>
>
>
> 11
> 19
> B
> 10
>
> 8
> 32
> D
> 13
>
>
>
> 12
> 15
> B
> 9
>
> 9
> 25
> A
> 11
>
>
>
> 13
> 17
> A
> 8
>
> 10
> 18
> C
> 7
>
>
>
> 14
> 18
> C
> 7
>
>
>
>
>
>
>
>
> 15
> 38
> B
> 6
>
>
>
>
>
>
>
>
> 16
> 10
> B
> 5
>
>
>
>
>
>
>
>
> 17
> 22
> B
> 4
>
>
>
>
>
>
>
>
> 18
> 4
> D
> 3
>
>
>
>
>
>
>
>
> 19
> 33
> A
> 2
>
>
>
>
>
>
>
>
> 20
> 37
> A
> 1
>
>
>
>
>
>
>
>
>
>
> How to make the selection of these 10 elements that fit one of the 3 options using R?
>
> Thanks,
>
> Prof. Dr. Silvano Cesar da Costa
> Universidade Estadual de Londrina
> Centro de Ciências Exatas
> Departamento de Estatística
>
> Fone: (43) 3371-4346
>
>
> Em sex., 20 de ago. de 2021 às 03:28, PIKAL Petr <mailto:mailto:petr.pikal using precheza.cz> escreveu:
> Hallo
>
> I am confused, maybe others know what do you want but could you be more specific?
>
> Let say you have such data
> set.seed(123)
> Var.1 = rep(LETTERS[1:4], 10)
> Var.2 = sample(1:40, replace=FALSE)
> data = data.frame(Var.1, Var.2)
>
> What should be the desired outcome?
>
> You can sort
> data <- data[order(data$Var.2, decreasing=TRUE), ]
> and split the data
> > split(data$Var.2, data$Var.1)
> $A
>  [1] 38 35 32 31 30 22 11  8  2  1
>
> $B
>  [1] 39 28 25 23 16 15  7  6  5  4
>
> $C
>  [1] 40 36 29 26 21 19 18 14 10  9
>
> $D
>  [1] 37 34 33 27 24 20 17 13 12  3
>
> T inspect highest values. But here I am lost. As C is first and fourth biggest value, you follow third option and select 3 highest A, 3B 2C and 2D?
>
> Or I do not understand at all what you really want to achieve.
>
> Cheers
> Petr
>
> > -----Original Message-----
> > From: R-help <mailto:mailto:r-help-bounces using r-project.org> On Behalf Of Silvano Cesar da
> > Costa
> > Sent: Thursday, August 19, 2021 10:40 PM
> > To: mailto:mailto:r-help using r-project.org
> > Subject: [R] Selecting elements
> >
> > Hi,
> >
> > I need to select 15 elements, always considering the highest values
> > (descending order) but obeying the following configuration:
> >
> > 3A - 4B - 0C - 3D or
> > 2A - 5B - 0C - 3D or
> > 3A - 3B - 2C - 2D
> >
> > If I have, for example, 5 A elements as the highest values, I can only choose
> > (first and third choice) or 2 (second choice) elements.
> >
> > how to make this selection?
> >
> >
> > library(dplyr)
> >
> > Var.1 = rep(LETTERS[1:4], 10)
> > Var.2 = sample(1:40, replace=FALSE)
> >
> > data = data.frame(Var.1, Var.2)
> > (data = data[order(data$Var.2, decreasing=TRUE), ])
> >
> > Elements = data %>%
> >   arrange(desc(Var.2))
> >
> > Thanks,
> >
> > Prof. Dr. Silvano Cesar da Costa
> > Universidade Estadual de Londrina
> > Centro de Ciências Exatas
> > Departamento de Estatística
> >
> > Fone: (43) 3371-4346
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > mailto:mailto:R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.