[R] find unique and summerize

Val valkremk at gmail.com
Sat Feb 3 04:00:30 CET 2018


Hi all,

I have a data set  need to be summarized by unique ID (count and sum of a
variable)
A unique individual ID (country name  Abbreviation  followed by an integer
numbers)  may  have observation in several countries. Then the  ID was
changed by adding the country code as a prefix  and  new ID was constructed
or recorded like (country code, + the original unique ID  Example
original ID   "CAN1540164" , if this ID has an observation in CANADA then
the ID was changed to    "1CAN1540164".   From this new ID I want get out
the country code  get the  original unique ID  and   summarize the data by
unique ID and country code

The data set look like
mydata <- read.table(textConnection("GR ID iflag Y
A 1CAN1540164 1 20
A 1CAN1540164 1 12
A 1CAN1540164 1 15
A 44CAN1540164 1 30
A 44CAN1540164 1 24
A 44CAN1540164 1 25
A 44CAN1540164 1 11
A 33CAN1540164 1 12
A 33CAN1540164 1 23
A 33CAN1540164 1 65
A 33CAN1540164 1 41
A 358CAN1540164 1 28
A 358CAN1540164 1 32
A 358CAN1540164 1 41
A 358CAN1540164 1 54
A 358CAN1540164 1 29
A 358CAN1540164 1 64
B 1USA1540165 1 125
B 1USA1540165 1 165
B 44USA1540165 1 171
B 33USA1540165 1 254
B 33USA1540165 1 241
B 33USA1540165 1 262
B 358USA1540165 1 321
C 358FIN1540166 1 225 "),header = TRUE ,stringsAsFactors = FALSE)

>From the above data there are three unique IDs and  four country codes (1,
44, 33 and 358)

I want the following two tables

Table 1. count  the  unique ID by country code
                          1   44   33   358     TOT
CAN1540164     3    4     4      6        17
USA1540165      2   1      3     1          7
FIN1540166       -     -       -      1         1
           TOT         5    5      7      8       25


Table 2  Sum of Y variable by unique ID and country. code

                          1       44       33      358      TOT
CAN1540164    47     90      141      248       526
USA1540165   290   171      757      321     1539
FIN1540166        -        -         -         225       225
            TOT      337     261      898    794     2290


How do I do it in R?

 The first step is to get the unique country codes unique ID by splitting
the new ID

Thank you in advance

	[[alternative HTML version deleted]]



More information about the R-help mailing list