[R] find unique and summerize

Rui Barradas ruipbarradas at sapo.pt
Sat Feb 3 06:26:16 CET 2018


Hello,

Thanks for the reproducible example.
See if the following does what you want.

IDNum <- sub("^(\\d+).*", "\\1", mydata$ID)
Country <- sub("^\\d+(.*)", "\\1", mydata$ID)

tbl1 <- table(Country, IDNum)
addmargins(tbl1)

tbl2 <- xtabs(Y ~ Country + IDNum, mydata)
addmargins(tbl2)


Hope this helps,

Rui Barradas

On 2/3/2018 3:00 AM, Val wrote:
> Hi all,
> 
> I have a data set  need to be summarized by unique ID (count and sum of a
> variable)
> A unique individual ID (country name  Abbreviation  followed by an integer
> numbers)  may  have observation in several countries. Then the  ID was
> changed by adding the country code as a prefix  and  new ID was constructed
> or recorded like (country code, + the original unique ID  Example
> original ID   "CAN1540164" , if this ID has an observation in CANADA then
> the ID was changed to    "1CAN1540164".   From this new ID I want get out
> the country code  get the  original unique ID  and   summarize the data by
> unique ID and country code
> 
> The data set look like
> mydata <- read.table(textConnection("GR ID iflag Y
> A 1CAN1540164 1 20
> A 1CAN1540164 1 12
> A 1CAN1540164 1 15
> A 44CAN1540164 1 30
> A 44CAN1540164 1 24
> A 44CAN1540164 1 25
> A 44CAN1540164 1 11
> A 33CAN1540164 1 12
> A 33CAN1540164 1 23
> A 33CAN1540164 1 65
> A 33CAN1540164 1 41
> A 358CAN1540164 1 28
> A 358CAN1540164 1 32
> A 358CAN1540164 1 41
> A 358CAN1540164 1 54
> A 358CAN1540164 1 29
> A 358CAN1540164 1 64
> B 1USA1540165 1 125
> B 1USA1540165 1 165
> B 44USA1540165 1 171
> B 33USA1540165 1 254
> B 33USA1540165 1 241
> B 33USA1540165 1 262
> B 358USA1540165 1 321
> C 358FIN1540166 1 225 "),header = TRUE ,stringsAsFactors = FALSE)
> 
>  From the above data there are three unique IDs and  four country codes (1,
> 44, 33 and 358)
> 
> I want the following two tables
> 
> Table 1. count  the  unique ID by country code
>                            1   44   33   358     TOT
> CAN1540164     3    4     4      6        17
> USA1540165      2   1      3     1          7
> FIN1540166       -     -       -      1         1
>             TOT         5    5      7      8       25
> 
> 
> Table 2  Sum of Y variable by unique ID and country. code
> 
>                            1       44       33      358      TOT
> CAN1540164    47     90      141      248       526
> USA1540165   290   171      757      321     1539
> FIN1540166        -        -         -         225       225
>              TOT      337     261      898    794     2290
> 
> 
> How do I do it in R?
> 
>   The first step is to get the unique country codes unique ID by splitting
> the new ID
> 
> Thank you in advance
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list