[R] Selecting elements

Mon Aug 23 10:05:30 CEST 2021

Hi

Only I got your HTML formated mail, rest of the world got complete mess. Do not use HTML formating.

As I got it right I wonder why in your second example you did not follow
3A - 3B - 2C - 2D

as D were positioned 1st and 4th.

I hope that you could use something like

sss <- split(data$Var.2, data$Var.1)
lapply(sss, cumsum)
$A
 [1]  38  73 105 136 166 188 199 207 209 210

$B
 [1]  39  67  92 115 131 146 153 159 164 168

$C
 [1]  40  76 105 131 152 171 189 203 213 222

$D
 [1]  37  71 104 131 155 175 192 205 217 220

Now you need to evaluate this result according to your sets. Here the highest value (76) is in C so the set with 2C is the one you should choose and select you value according to this set.

With
> set.seed(666)
> Var.1 = rep(LETTERS[1:4], 10)
> Var.2 = sample(1:40, replace=FALSE)
> data = data.frame(Var.1, Var.2)
> data <- data[order(data$Var.2, decreasing=TRUE), ]
> sss <- split(data$Var.2, data$Var.1)
> lapply(sss, cumsum)
$A
 [1]  36  70 102 133 163 182 200 207 212 213

$B
 [1]  35  57  78  95 108 120 131 140 148 150

$C
 [1]  40  73 102 130 156 180 196 211 221 225

$D
 [1]  39  77 114 141 166 189 209 223 229 232

Highest value is in D so either 3A - 3B - 2C - 2D  or 3A - 3B - 2C - 2D should be appropriate. And here I am again lost as both sets are same. Maybe you need to reconsider your statements.

Cheers
Petr

From: Silvano Cesar da Costa <silvano using uel.br> 
Sent: Friday, August 20, 2021 9:28 PM
To: PIKAL Petr <petr.pikal using precheza.cz>
Cc: r-help using r-project.org
Subject: Re: [R] Selecting elements

Hi, thanks you for the answer. 
Sorry English is not my native language.

But you got it right. 
> As C is first and fourth biggest value, you follow third option and select 3 highest A, 3B 2C and 2D?

I must select the 10 (not 15) highest values, but which follow a certain order:
3A - 3B - 2C - 2D     or 
2A - 5B - 0C - 3D     or
3A - 3B - 2C - 2D
I'll put the example in Excel for a better understanding (with 20 elements only). 
I must select 10 elements (the highest values of variable Var.2), which fit one of the 3 options above. 

Number
Position
Var.1
Var.2

1
27
C
40

2
30
B
39

Selected: 

3
5
A
38

Number
Position
Var.1
Var.2

4
16
D
37

1
27
C
40

5
23
C
36

2
30
B
39

3A - 3B - 2C - 2D
6
13
A
35

3
5
A
38

7
20
D
34

4
16
D
37

3A - 3B - 1C - 3D
8
12
D
33

5
23
C
36

9
9
A
32

6
13
A
35

2A - 5B - 0C - 3D
10
1
A
31

7
20
D
34

11
21
A
30

10
9
A
32

12
35
C
29

13
14
B
28

13
14
B
28

17
6
B
25

14
8
D
27

15
7
C
26

16
6
B
25

17
40
D
24

18
26
B
23

19
29
A
22

20
31
C
21

Second option (other data set):

Number
Position
Var.1
Var.2

1
36
D
20

2
11
B
19

Selected: 

3
39
A
18

Number
Position
Var.1
Var.2

4
24
D
17

1
36
D
20

5
34
B
16

2
11
B
19

3A - 3B - 2C - 2D
6
2
B
15

3
39
A
18

7
3
A
14

4
24
D
17

3A - 3B - 1C - 3D
8
32
D
13

5
34
B
16

9
28
D
12

6
2
B
15

2A - 5B - 0C - 3D
10
25
A
11

7
3
A
14

11
19
B
10

8
32
D
13

12
15
B
9

9
25
A
11

13
17
A
8

10
18
C
7

14
18
C
7

15
38
B
6

16
10
B
5

17
22
B
4

18
4
D
3

19
33
A
2

20
37
A
1

How to make the selection of these 10 elements that fit one of the 3 options using R?

Thanks,

Prof. Dr. Silvano Cesar da Costa
Universidade Estadual de Londrina
Centro de Ciências Exatas
Departamento de Estatística

Fone: (43) 3371-4346

Em sex., 20 de ago. de 2021 às 03:28, PIKAL Petr <mailto:petr.pikal using precheza.cz> escreveu:
Hallo

I am confused, maybe others know what do you want but could you be more specific?

Let say you have such data
set.seed(123)
Var.1 = rep(LETTERS[1:4], 10)
Var.2 = sample(1:40, replace=FALSE)
data = data.frame(Var.1, Var.2)

What should be the desired outcome?

You can sort
data <- data[order(data$Var.2, decreasing=TRUE), ]
and split the data
> split(data$Var.2, data$Var.1)
$A
 [1] 38 35 32 31 30 22 11  8  2  1

$B
 [1] 39 28 25 23 16 15  7  6  5  4

$C
 [1] 40 36 29 26 21 19 18 14 10  9

$D
 [1] 37 34 33 27 24 20 17 13 12  3

T inspect highest values. But here I am lost. As C is first and fourth biggest value, you follow third option and select 3 highest A, 3B 2C and 2D?

Or I do not understand at all what you really want to achieve.

Cheers
Petr

> -----Original Message-----
> From: R-help <mailto:r-help-bounces using r-project.org> On Behalf Of Silvano Cesar da
> Costa
> Sent: Thursday, August 19, 2021 10:40 PM
> To: mailto:r-help using r-project.org
> Subject: [R] Selecting elements
> 
> Hi,
> 
> I need to select 15 elements, always considering the highest values
> (descending order) but obeying the following configuration:
> 
> 3A - 4B - 0C - 3D or
> 2A - 5B - 0C - 3D or
> 3A - 3B - 2C - 2D
> 
> If I have, for example, 5 A elements as the highest values, I can only choose
> (first and third choice) or 2 (second choice) elements.
> 
> how to make this selection?
> 
> 
> library(dplyr)
> 
> Var.1 = rep(LETTERS[1:4], 10)
> Var.2 = sample(1:40, replace=FALSE)
> 
> data = data.frame(Var.1, Var.2)
> (data = data[order(data$Var.2, decreasing=TRUE), ])
> 
> Elements = data %>%
>   arrange(desc(Var.2))
> 
> Thanks,
> 
> Prof. Dr. Silvano Cesar da Costa
> Universidade Estadual de Londrina
> Centro de Ciências Exatas
> Departamento de Estatística
> 
> Fone: (43) 3371-4346
> 
>       [[alternative HTML version deleted]]
> 
> ______________________________________________
> mailto:R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.