[R] find unique and summerize

Rui Barradas ruipbarradas at sapo.pt
Sat Feb 3 20:05:43 CET 2018


Hello,

As for the first question, instead of writing a xlsx file, maybe it is 
easier to write a csv file and then open it with Excel.

tbl2 <- addmargins(tbl1)
write.csv(tbl2, "tt1.csv")

As for the second question, the following does it.

inx <- apply(tbl1, 1, function(x) all(x != 0))
tbl1b <- addmargins(tbl1[inx, ])
tbl1b


Hope this helps,

Rui Barradas

On 2/3/2018 4:42 PM, Val wrote:
> Thank you so much Rui.
> 
> 1. How do I export this table to excel file?
> I used this
>        tbl1 <- table(Country, IDNum)
>        tbl2=addmargins(tbl1)
>        write.xlsx(tbl2,"tt1.xlsx"),sheetName="summary", row.names=FALSE)
> The above did not give me that table.
> 
> 
> 2. I want select those unique Ids that do have records in all countries.
>   From the above data set, this ID  "FIN1540166"  should be excluded 
> from the summary table and the table looks like as follow
> 
> IDNum Country 1 33 358 44 Sum CAN1540164 47 141 248 90 526 USA1540165 
> 290 757 321 171 1539 Sum 337 898 569 261 2065
> 
> Thank you again
> 
> 
> On Fri, Feb 2, 2018 at 11:26 PM, Rui Barradas <ruipbarradas at sapo.pt 
> <mailto:ruipbarradas at sapo.pt>> wrote:
> 
>     Hello,
> 
>     Thanks for the reproducible example.
>     See if the following does what you want.
> 
>     IDNum <- sub("^(\\d+).*", "\\1", mydata$ID)
>     Country <- sub("^\\d+(.*)", "\\1", mydata$ID)
> 
>     tbl1 <- table(Country, IDNum)
>     addmargins(tbl1)
> 
>     tbl2 <- xtabs(Y ~ Country + IDNum, mydata)
>     addmargins(tbl2)
> 
> 
>     Hope this helps,
> 
>     Rui Barradas
> 
> 
>     On 2/3/2018 3:00 AM, Val wrote:
> 
>         Hi all,
> 
>         I have a data set  need to be summarized by unique ID (count and
>         sum of a
>         variable)
>         A unique individual ID (country name  Abbreviation  followed by
>         an integer
>         numbers)  may  have observation in several countries. Then the 
>         ID was
>         changed by adding the country code as a prefix  and  new ID was
>         constructed
>         or recorded like (country code, + the original unique ID  Example
>         original ID   "CAN1540164" , if this ID has an observation in
>         CANADA then
>         the ID was changed to    "1CAN1540164".   From this new ID I
>         want get out
>         the country code  get the  original unique ID  and   summarize
>         the data by
>         unique ID and country code
> 
>         The data set look like
>         mydata <- read.table(textConnection("GR ID iflag Y
>         A 1CAN1540164 1 20
>         A 1CAN1540164 1 12
>         A 1CAN1540164 1 15
>         A 44CAN1540164 1 30
>         A 44CAN1540164 1 24
>         A 44CAN1540164 1 25
>         A 44CAN1540164 1 11
>         A 33CAN1540164 1 12
>         A 33CAN1540164 1 23
>         A 33CAN1540164 1 65
>         A 33CAN1540164 1 41
>         A 358CAN1540164 1 28
>         A 358CAN1540164 1 32
>         A 358CAN1540164 1 41
>         A 358CAN1540164 1 54
>         A 358CAN1540164 1 29
>         A 358CAN1540164 1 64
>         B 1USA1540165 1 125
>         B 1USA1540165 1 165
>         B 44USA1540165 1 171
>         B 33USA1540165 1 254
>         B 33USA1540165 1 241
>         B 33USA1540165 1 262
>         B 358USA1540165 1 321
>         C 358FIN1540166 1 225 "),header = TRUE ,stringsAsFactors = FALSE)
> 
>           From the above data there are three unique IDs and  four
>         country codes (1,
>         44, 33 and 358)
> 
>         I want the following two tables
> 
>         Table 1. count  the  unique ID by country code
>                                     1   44   33   358     TOT
>         CAN1540164     3    4     4      6        17
>         USA1540165      2   1      3     1          7
>         FIN1540166       -     -       -      1         1
>                      TOT         5    5      7      8       25
> 
> 
>         Table 2  Sum of Y variable by unique ID and country. code
> 
>                                     1       44       33      358      TOT
>         CAN1540164    47     90      141      248       526
>         USA1540165   290   171      757      321     1539
>         FIN1540166        -        -         -         225       225
>                       TOT      337     261      898    794     2290
> 
> 
>         How do I do it in R?
> 
>            The first step is to get the unique country codes unique ID
>         by splitting
>         the new ID
> 
>         Thank you in advance
> 
>                  [[alternative HTML version deleted]]
> 
>         ______________________________________________
>         R-help at r-project.org <mailto:R-help at r-project.org> mailing list
>         -- To UNSUBSCRIBE and more, see
>         https://stat.ethz.ch/mailman/listinfo/r-help
>         <https://stat.ethz.ch/mailman/listinfo/r-help>
>         PLEASE do read the posting guide
>         http://www.R-project.org/posting-guide.html
>         <http://www.R-project.org/posting-guide.html>
>         and provide commented, minimal, self-contained, reproducible code.
> 
>



More information about the R-help mailing list