# [R] Selecting elements

Silvano Cesar da Costa @||v@no @end|ng |rom ue|@br
Mon Aug 23 19:54:13 CEST 2021

```Hi,

I apologize for the confusion. I will try to be clearer in my explanation.
I believe that with the R script it becomes clearer.

I have 4 variables with 10 repetitions and each one receives a value,
randomly.
I order the dataset from largest to smallest value. I have to select 10
elements in
descending order of values, according to one of three schemes:

# 3A - 3B - 2C - 2D
# 2A - 5B - 0C - 3D
# 3A - 4B - 2C - 1D

If the first 3 elements (out of the 10 to be selected) are of the letter D,
automatically
the adopted scheme will be the second. So, I have to (following) choose 2A,
5B and 0C.
How to make the selection automatically?

I created two selection examples, with different schemes:

set.seed(123)

Var.1 = rep(LETTERS[1:4], 10)
Var.2 = sample(1:40, replace=FALSE)

data = data.frame(Var.1, Var.2)

(Order = data[order(data\$Var.2, decreasing=TRUE), ])

# I must select the 10 highest values (),
# but which follow a certain scheme:
#
#  3A - 3B - 2C - 2D     or
#  2A - 5B - 0C - 3D     or
#  3A - 4B - 2C - 1D
#
# In this case, I started with the highest value that refers to the letter
C.
# Next comes only 1 of the letters B, A and D. All are selected once.
# The fifth observation is the letter C, completing 2 C values. In this
case,
# following the 3 adopted schemes, note that the second scheme has 0C,
# so this scheme is out.
# Therefore, it can be the first scheme (3A - 3B - 2C - 2D) or the
# third scheme (3A - 4B - 2C - 1D).
# The next letter to be completed is the D (fourth and seventh elements),
# among the 10 elements being selected. Therefore, the scheme adopted is
the
# first one (3A - 3B - 2C - 2D).
# Therefore, it is necessary to select 2 values with the letter B and 1
value
# with the letter A.
#
# Manual Selection -
# The end result is:
(Selected.data = Order[c(1,2,3,4,5,6,7,9,13,16), ])

# Scheme: 3A - 3B - 2C - 2D
sort(Selected.data\$Var.1)

#------------------
# Second example: -
#------------------
set.seed(4)

Var.1 = rep(LETTERS[1:4], 10)
Var.2 = sample(1:40, replace=FALSE)

data = data.frame(Var.1, Var.2)
(Order = data[order(data\$Var.2, decreasing=TRUE), ])

# The end result is:
(Selected.data.2 = Order[c(1,2,3,4,5,6,7,8,9,11), ])

# Scheme: 3A - 4B - 2C - 1D
sort(Selected.data.2\$Var.1)

How to make the selection of the 10 elements automatically?

Thank you very much.

Prof. Dr. Silvano Cesar da Costa
Centro de Ciências Exatas
Departamento de Estatística

Fone: (43) 3371-4346

Em seg., 23 de ago. de 2021 às 05:05, PIKAL Petr <petr.pikal using precheza.cz>
escreveu:

> Hi
>
> Only I got your HTML formated mail, rest of the world got complete mess.
> Do not use HTML formating.
>
> As I got it right I wonder why in your second example you did not follow
> 3A - 3B - 2C - 2D
>
> as D were positioned 1st and 4th.
>
> I hope that you could use something like
>
> sss <- split(data\$Var.2, data\$Var.1)
> lapply(sss, cumsum)
> \$A
>  [1]  38  73 105 136 166 188 199 207 209 210
>
> \$B
>  [1]  39  67  92 115 131 146 153 159 164 168
>
> \$C
>  [1]  40  76 105 131 152 171 189 203 213 222
>
> \$D
>  [1]  37  71 104 131 155 175 192 205 217 220
>
> Now you need to evaluate this result according to your sets. Here the
> highest value (76) is in C so the set with 2C is the one you should choose
> and select you value according to this set.
>
> With
> > set.seed(666)
> > Var.1 = rep(LETTERS[1:4], 10)
> > Var.2 = sample(1:40, replace=FALSE)
> > data = data.frame(Var.1, Var.2)
> > data <- data[order(data\$Var.2, decreasing=TRUE), ]
> > sss <- split(data\$Var.2, data\$Var.1)
> > lapply(sss, cumsum)
> \$A
>  [1]  36  70 102 133 163 182 200 207 212 213
>
> \$B
>  [1]  35  57  78  95 108 120 131 140 148 150
>
> \$C
>  [1]  40  73 102 130 156 180 196 211 221 225
>
> \$D
>  [1]  39  77 114 141 166 189 209 223 229 232
>
> Highest value is in D so either 3A - 3B - 2C - 2D  or 3A - 3B - 2C - 2D
> should be appropriate. And here I am again lost as both sets are same.
> Maybe you need to reconsider your statements.
>
> Cheers
> Petr
>
> From: Silvano Cesar da Costa <silvano using uel.br>
> Sent: Friday, August 20, 2021 9:28 PM
> To: PIKAL Petr <petr.pikal using precheza.cz>
> Cc: r-help using r-project.org
> Subject: Re: [R] Selecting elements
>
> Hi, thanks you for the answer.
> Sorry English is not my native language.
>
> But you got it right.
> > As C is first and fourth biggest value, you follow third option and
> select 3 highest A, 3B 2C and 2D?
>
> I must select the 10 (not 15) highest values, but which follow a certain
> order:
> 3A - 3B - 2C - 2D     or
> 2A - 5B - 0C - 3D     or
> 3A - 3B - 2C - 2D
> I'll put the example in Excel for a better understanding (with 20 elements
> only).
> I must select 10 elements (the highest values of variable Var.2), which
> fit one of the 3 options above.
>
> Number
> Position
> Var.1
> Var.2
>
>
>
>
>
>
>
>
> 1
> 27
> C
> 40
>
>
>
>
>
>
>
>
> 2
> 30
> B
> 39
>
> Selected:
>
>
>
>
>
> 3
> 5
> A
> 38
>
> Number
> Position
> Var.1
> Var.2
>
>
>
> 4
> 16
> D
> 37
>
> 1
> 27
> C
> 40
>
>
>
> 5
> 23
> C
> 36
>
> 2
> 30
> B
> 39
>
> 3A - 3B - 2C - 2D
> 6
> 13
> A
> 35
>
> 3
> 5
> A
> 38
>
>
>
> 7
> 20
> D
> 34
>
> 4
> 16
> D
> 37
>
> 3A - 3B - 1C - 3D
> 8
> 12
> D
> 33
>
> 5
> 23
> C
> 36
>
>
>
> 9
> 9
> A
> 32
>
> 6
> 13
> A
> 35
>
> 2A - 5B - 0C - 3D
> 10
> 1
> A
> 31
>
> 7
> 20
> D
> 34
>
>
>
> 11
> 21
> A
> 30
>
> 10
> 9
> A
> 32
>
>
>
> 12
> 35
> C
> 29
>
> 13
> 14
> B
> 28
>
>
>
> 13
> 14
> B
> 28
>
> 17
> 6
> B
> 25
>
>
>
> 14
> 8
> D
> 27
>
>
>
>
>
>
>
>
> 15
> 7
> C
> 26
>
>
>
>
>
>
>
>
> 16
> 6
> B
> 25
>
>
>
>
>
>
>
>
> 17
> 40
> D
> 24
>
>
>
>
>
>
>
>
> 18
> 26
> B
> 23
>
>
>
>
>
>
>
>
> 19
> 29
> A
> 22
>
>
>
>
>
>
>
>
> 20
> 31
> C
> 21
>
>
>
>
>
>
>
>
>
>
>
> Second option (other data set):
>
> Number
> Position
> Var.1
> Var.2
>
>
>
>
>
>
>
>
> 1
> 36
> D
> 20
>
>
>
>
>
>
>
>
> 2
> 11
> B
> 19
>
> Selected:
>
>
>
>
>
> 3
> 39
> A
> 18
>
> Number
> Position
> Var.1
> Var.2
>
>
>
> 4
> 24
> D
> 17
>
> 1
> 36
> D
> 20
>
>
>
> 5
> 34
> B
> 16
>
> 2
> 11
> B
> 19
>
> 3A - 3B - 2C - 2D
> 6
> 2
> B
> 15
>
> 3
> 39
> A
> 18
>
>
>
> 7
> 3
> A
> 14
>
> 4
> 24
> D
> 17
>
> 3A - 3B - 1C - 3D
> 8
> 32
> D
> 13
>
> 5
> 34
> B
> 16
>
>
>
> 9
> 28
> D
> 12
>
> 6
> 2
> B
> 15
>
> 2A - 5B - 0C - 3D
> 10
> 25
> A
> 11
>
> 7
> 3
> A
> 14
>
>
>
> 11
> 19
> B
> 10
>
> 8
> 32
> D
> 13
>
>
>
> 12
> 15
> B
> 9
>
> 9
> 25
> A
> 11
>
>
>
> 13
> 17
> A
> 8
>
> 10
> 18
> C
> 7
>
>
>
> 14
> 18
> C
> 7
>
>
>
>
>
>
>
>
> 15
> 38
> B
> 6
>
>
>
>
>
>
>
>
> 16
> 10
> B
> 5
>
>
>
>
>
>
>
>
> 17
> 22
> B
> 4
>
>
>
>
>
>
>
>
> 18
> 4
> D
> 3
>
>
>
>
>
>
>
>
> 19
> 33
> A
> 2
>
>
>
>
>
>
>
>
> 20
> 37
> A
> 1
>
>
>
>
>
>
>
>
>
>
> How to make the selection of these 10 elements that fit one of the 3
> options using R?
>
> Thanks,
>
> Prof. Dr. Silvano Cesar da Costa
> Centro de Ciências Exatas
> Departamento de Estatística
>
> Fone: (43) 3371-4346
>
>
> Em sex., 20 de ago. de 2021 às 03:28, PIKAL Petr <mailto:
> petr.pikal using precheza.cz> escreveu:
> Hallo
>
> I am confused, maybe others know what do you want but could you be more
> specific?
>
> Let say you have such data
> set.seed(123)
> Var.1 = rep(LETTERS[1:4], 10)
> Var.2 = sample(1:40, replace=FALSE)
> data = data.frame(Var.1, Var.2)
>
> What should be the desired outcome?
>
> You can sort
> data <- data[order(data\$Var.2, decreasing=TRUE), ]
> and split the data
> > split(data\$Var.2, data\$Var.1)
> \$A
>  [1] 38 35 32 31 30 22 11  8  2  1
>
> \$B
>  [1] 39 28 25 23 16 15  7  6  5  4
>
> \$C
>  [1] 40 36 29 26 21 19 18 14 10  9
>
> \$D
>  [1] 37 34 33 27 24 20 17 13 12  3
>
> T inspect highest values. But here I am lost. As C is first and fourth
> biggest value, you follow third option and select 3 highest A, 3B 2C and 2D?
>
> Or I do not understand at all what you really want to achieve.
>
> Cheers
> Petr
>
> > -----Original Message-----
> > From: R-help <mailto:r-help-bounces using r-project.org> On Behalf Of Silvano
> Cesar da
> > Costa
> > Sent: Thursday, August 19, 2021 10:40 PM
> > To: mailto:r-help using r-project.org
> > Subject: [R] Selecting elements
> >
> > Hi,
> >
> > I need to select 15 elements, always considering the highest values
> > (descending order) but obeying the following configuration:
> >
> > 3A - 4B - 0C - 3D or
> > 2A - 5B - 0C - 3D or
> > 3A - 3B - 2C - 2D
> >
> > If I have, for example, 5 A elements as the highest values, I can only
> choose
> > (first and third choice) or 2 (second choice) elements.
> >
> > how to make this selection?
> >
> >
> > library(dplyr)
> >
> > Var.1 = rep(LETTERS[1:4], 10)
> > Var.2 = sample(1:40, replace=FALSE)
> >
> > data = data.frame(Var.1, Var.2)
> > (data = data[order(data\$Var.2, decreasing=TRUE), ])
> >
> > Elements = data %>%
> >   arrange(desc(Var.2))
> >
> > Thanks,
> >
> > Prof. Dr. Silvano Cesar da Costa
> > Centro de Ciências Exatas
> > Departamento de Estatística
> >
> > Fone: (43) 3371-4346
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > mailto:R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help