# [R] Selecting elements

PIKAL Petr petr@p|k@| @end|ng |rom prechez@@cz
Tue Aug 24 11:10:54 CEST 2021

```Hi.

Now it is understandable.  However the solution is not clear for me.

table(Order\$Var.1[1:10])
A B C D
4 1 2 3

should give you a hint which scheme could be acceptable, but how to do it programmatically I do not know.

maybe to start with lower value in the table call and gradually increse it to check which scheme starts to be the chosen one

> table(data.o\$Var.1[1]) # scheme 2 is out
C
1
...
> table(data.o\$Var.1[1:5]) #scheme 3
A B C D
1 1 2 1

> table(data.o\$Var.1[1:6]) #scheme 3

A B C D
2 1 2 1

> table(data.o\$Var.1[1:7]) # scheme1
A B C D
2 1 2 2

> table(data.o\$Var.1[1:8]) # no such scheme, so scheme 1 is chosen one
A B C D
2 1 2 3

#Now you need to select values based on scheme 1.
# 3A - 3B - 2C - 2D

sss <- split(Order, Order\$Var.1)
selection <- c(3,3,2,2)
result <- vector("list", 4)

#I would use loop

for(i in 1:4) {
result[[i]] <- sss[[i]][1:selection[i],]
}

Maybe someone come with other ingenious solution.

Cheers
Petr

From: Silvano Cesar da Costa <silvano using uel.br>
Sent: Monday, August 23, 2021 7:54 PM
To: PIKAL Petr <petr.pikal using precheza.cz>
Cc: r-help using r-project.org
Subject: Re: [R] Selecting elements

Hi,

I apologize for the confusion. I will try to be clearer in my explanation. I believe that with the R script it becomes clearer.

I have 4 variables with 10 repetitions and each one receives a value, randomly.
I order the dataset from largest to smallest value. I have to select 10 elements in
descending order of values, according to one of three schemes:

# 3A - 3B - 2C - 2D
# 2A - 5B - 0C - 3D
# 3A - 4B - 2C - 1D

If the first 3 elements (out of the 10 to be selected) are of the letter D, automatically
the adopted scheme will be the second. So, I have to (following) choose 2A, 5B and 0C.
How to make the selection automatically?

I created two selection examples, with different schemes:

set.seed(123)

Var.1 = rep(LETTERS[1:4], 10)
Var.2 = sample(1:40, replace=FALSE)

data = data.frame(Var.1, Var.2)

(Order = data[order(data\$Var.2, decreasing=TRUE), ])

# I must select the 10 highest values (),
# but which follow a certain scheme:
#
#  3A - 3B - 2C - 2D     or
#  2A - 5B - 0C - 3D     or
#  3A - 4B - 2C - 1D
#
# In this case, I started with the highest value that refers to the letter C.
# Next comes only 1 of the letters B, A and D. All are selected once.
# The fifth observation is the letter C, completing 2 C values. In this case,
# following the 3 adopted schemes, note that the second scheme has 0C,
# so this scheme is out.
# Therefore, it can be the first scheme (3A - 3B - 2C - 2D) or the
# third scheme (3A - 4B - 2C - 1D).
# The next letter to be completed is the D (fourth and seventh elements),
# among the 10 elements being selected. Therefore, the scheme adopted is the
# first one (3A - 3B - 2C - 2D).
# Therefore, it is necessary to select 2 values with the letter B and 1 value
# with the letter A.
#
# Manual Selection -
# The end result is:
(Selected.data = Order[c(1,2,3,4,5,6,7,9,13,16), ])

# Scheme: 3A - 3B - 2C - 2D
sort(Selected.data\$Var.1)

#------------------
# Second example: -
#------------------
set.seed(4)

Var.1 = rep(LETTERS[1:4], 10)
Var.2 = sample(1:40, replace=FALSE)

data = data.frame(Var.1, Var.2)
(Order = data[order(data\$Var.2, decreasing=TRUE), ])

# The end result is:
(Selected.data.2 = Order[c(1,2,3,4,5,6,7,8,9,11), ])

# Scheme: 3A - 4B - 2C - 1D
sort(Selected.data.2\$Var.1)

How to make the selection of the 10 elements automatically?

Thank you very much.

Prof. Dr. Silvano Cesar da Costa
Centro de Ciências Exatas
Departamento de Estatística

Fone: (43) 3371-4346

Em seg., 23 de ago. de 2021 às 05:05, PIKAL Petr <mailto:petr.pikal using precheza.cz> escreveu:
Hi

Only I got your HTML formated mail, rest of the world got complete mess. Do not use HTML formating.

As I got it right I wonder why in your second example you did not follow
3A - 3B - 2C - 2D

as D were positioned 1st and 4th.

I hope that you could use something like

sss <- split(data\$Var.2, data\$Var.1)
lapply(sss, cumsum)
\$A
[1]  38  73 105 136 166 188 199 207 209 210

\$B
[1]  39  67  92 115 131 146 153 159 164 168

\$C
[1]  40  76 105 131 152 171 189 203 213 222

\$D
[1]  37  71 104 131 155 175 192 205 217 220

Now you need to evaluate this result according to your sets. Here the highest value (76) is in C so the set with 2C is the one you should choose and select you value according to this set.

With
> set.seed(666)
> Var.1 = rep(LETTERS[1:4], 10)
> Var.2 = sample(1:40, replace=FALSE)
> data = data.frame(Var.1, Var.2)
> data <- data[order(data\$Var.2, decreasing=TRUE), ]
> sss <- split(data\$Var.2, data\$Var.1)
> lapply(sss, cumsum)
\$A
[1]  36  70 102 133 163 182 200 207 212 213

\$B
[1]  35  57  78  95 108 120 131 140 148 150

\$C
[1]  40  73 102 130 156 180 196 211 221 225

\$D
[1]  39  77 114 141 166 189 209 223 229 232

Highest value is in D so either 3A - 3B - 2C - 2D  or 3A - 3B - 2C - 2D should be appropriate. And here I am again lost as both sets are same. Maybe you need to reconsider your statements.

Cheers
Petr

From: Silvano Cesar da Costa <mailto:silvano using uel.br>
Sent: Friday, August 20, 2021 9:28 PM
To: PIKAL Petr <mailto:petr.pikal using precheza.cz>
Cc: mailto:r-help using r-project.org
Subject: Re: [R] Selecting elements

Hi, thanks you for the answer.
Sorry English is not my native language.

But you got it right.
> As C is first and fourth biggest value, you follow third option and select 3 highest A, 3B 2C and 2D?

I must select the 10 (not 15) highest values, but which follow a certain order:
3A - 3B - 2C - 2D     or
2A - 5B - 0C - 3D     or
3A - 3B - 2C - 2D
I'll put the example in Excel for a better understanding (with 20 elements only).
I must select 10 elements (the highest values of variable Var.2), which fit one of the 3 options above.

Number
Position
Var.1
Var.2

1
27
C
40

2
30
B
39

Selected:

3
5
A
38

Number
Position
Var.1
Var.2

4
16
D
37

1
27
C
40

5
23
C
36

2
30
B
39

3A - 3B - 2C - 2D
6
13
A
35

3
5
A
38

7
20
D
34

4
16
D
37

3A - 3B - 1C - 3D
8
12
D
33

5
23
C
36

9
9
A
32

6
13
A
35

2A - 5B - 0C - 3D
10
1
A
31

7
20
D
34

11
21
A
30

10
9
A
32

12
35
C
29

13
14
B
28

13
14
B
28

17
6
B
25

14
8
D
27

15
7
C
26

16
6
B
25

17
40
D
24

18
26
B
23

19
29
A
22

20
31
C
21

Second option (other data set):

Number
Position
Var.1
Var.2

1
36
D
20

2
11
B
19

Selected:

3
39
A
18

Number
Position
Var.1
Var.2

4
24
D
17

1
36
D
20

5
34
B
16

2
11
B
19

3A - 3B - 2C - 2D
6
2
B
15

3
39
A
18

7
3
A
14

4
24
D
17

3A - 3B - 1C - 3D
8
32
D
13

5
34
B
16

9
28
D
12

6
2
B
15

2A - 5B - 0C - 3D
10
25
A
11

7
3
A
14

11
19
B
10

8
32
D
13

12
15
B
9

9
25
A
11

13
17
A
8

10
18
C
7

14
18
C
7

15
38
B
6

16
10
B
5

17
22
B
4

18
4
D
3

19
33
A
2

20
37
A
1

How to make the selection of these 10 elements that fit one of the 3 options using R?

Thanks,

Prof. Dr. Silvano Cesar da Costa
Centro de Ciências Exatas
Departamento de Estatística

Fone: (43) 3371-4346

Em sex., 20 de ago. de 2021 às 03:28, PIKAL Petr <mailto:mailto:petr.pikal using precheza.cz> escreveu:
Hallo

I am confused, maybe others know what do you want but could you be more specific?

Let say you have such data
set.seed(123)
Var.1 = rep(LETTERS[1:4], 10)
Var.2 = sample(1:40, replace=FALSE)
data = data.frame(Var.1, Var.2)

What should be the desired outcome?

You can sort
data <- data[order(data\$Var.2, decreasing=TRUE), ]
and split the data
> split(data\$Var.2, data\$Var.1)
\$A
[1] 38 35 32 31 30 22 11  8  2  1

\$B
[1] 39 28 25 23 16 15  7  6  5  4

\$C
[1] 40 36 29 26 21 19 18 14 10  9

\$D
[1] 37 34 33 27 24 20 17 13 12  3

T inspect highest values. But here I am lost. As C is first and fourth biggest value, you follow third option and select 3 highest A, 3B 2C and 2D?

Or I do not understand at all what you really want to achieve.

Cheers
Petr

> -----Original Message-----
> From: R-help <mailto:mailto:r-help-bounces using r-project.org> On Behalf Of Silvano Cesar da
> Costa
> Sent: Thursday, August 19, 2021 10:40 PM
> To: mailto:mailto:r-help using r-project.org
> Subject: [R] Selecting elements
>
> Hi,
>
> I need to select 15 elements, always considering the highest values
> (descending order) but obeying the following configuration:
>
> 3A - 4B - 0C - 3D or
> 2A - 5B - 0C - 3D or
> 3A - 3B - 2C - 2D
>
> If I have, for example, 5 A elements as the highest values, I can only choose
> (first and third choice) or 2 (second choice) elements.
>
> how to make this selection?
>
>
> library(dplyr)
>
> Var.1 = rep(LETTERS[1:4], 10)
> Var.2 = sample(1:40, replace=FALSE)
>
> data = data.frame(Var.1, Var.2)
> (data = data[order(data\$Var.2, decreasing=TRUE), ])
>
> Elements = data %>%
>   arrange(desc(Var.2))
>
> Thanks,
>
> Prof. Dr. Silvano Cesar da Costa
> Centro de Ciências Exatas
> Departamento de Estatística
>
> Fone: (43) 3371-4346
>
>       [[alternative HTML version deleted]]
>
> ______________________________________________
> mailto:mailto:R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help