[R] create unique ID for each group

arun smartpink111 at yahoo.com
Tue May 7 22:25:57 CEST 2013



Hi,
Try this:

dat1<- read.table(text="
ObsNumber     ID          Weight
     1                 0001         12
     2                 0001          13
     3                 0001           14
     4                  0002         16
      5                 0002         17
     6                   N/A          18   
     7                   0003          19
     8                   N/A           20
     9                   0003          21
",sep="",header=TRUE,colClass=c("numeric","character","numeric"),na.strings="N/A")
dat2<- read.table(text="
ID               Height
0001            3.2
0001             2.6
0001             3.2
0002             2.2
0002              2.6
",sep="",header=TRUE,colClass=c("character","numeric")) 

dat1[!is.na(dat1$ID),"UniqueID"]<-unlist(lapply(split(dat1,dat1$ID),function(x) with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE) 

dat2$UniqueID<-unlist(lapply(split(dat2,dat2$ID),function(x) with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE)
library(plyr)
join(dat1,dat2,by="UniqueID",type="left")
 # ObsNumber   ID Weight UniqueID   ID Height
#1         1 0001     12   0001_1 0001    3.2
#2         2 0001     13   0001_2 0001    2.6
#3         3 0001     14   0001_3 0001    3.2
#4         4 0002     16   0002_1 0002    2.2
#5         5 0002     17   0002_2 0002    2.6
#6         6 <NA>     18     <NA> <NA>     NA
#7         7 0003     19   0003_1 <NA>     NA
#8         8 <NA>     20     <NA> <NA>     NA
#9         9 0003     21   0003_2 <NA>     NA
A.K.


________________________________
From: Ye Lin <yelin at lbl.gov>
To: arun <smartpink111 at yahoo.com> 
Sent: Tuesday, May 7, 2013 4:05 PM
Subject: Re: [R] create unique ID for each group



Yes, I need to keep the N/A records.



On Tue, May 7, 2013 at 1:00 PM, arun <smartpink111 at yahoo.com> wrote:


>
>Hi
>Do you need an output like this?
> merge(dat1,dat2,by="UniqueID",all.x=TRUE)
>  UniqueID ObsNumber ID.x Weight ID.y Height
>1   0001_1         1 0001     12 0001    3.2
>2   0001_2         2 0001     13 0001    2.6
>3   0001_3         3 0001     14 0001    3.2
>4   0002_1         4 0002     16 0002    2.2
>5   0002_2         5 0002     17 0002    2.6
>6     <NA>         6 <NA>     18 <NA>     NA
>
>when you use:
>
>
> dat1
>  ObsNumber   ID Weight UniqueID
>1         1 0001     12   0001_1
>2         2 0001     13   0001_2
>3         3 0001     14   0001_3
>4         4 0002     16   0002_1
>5         5 0002     17   0002_2
>6         6 <NA>     18     <NA>
>
> dat2
>    ID Height UniqueID
>1 0001    3.2   0001_1
>2 0001    2.6   0001_2
>3 0001    3.2   0001_3
>4 0002    2.2   0002_1
>5 0002    2.6   0002_2
>
>
>
>
>________________________________
>From: Ye Lin <yelin at lbl.gov>
>To: arun <smartpink111 at yahoo.com>
>Sent: Tuesday, May 7, 2013 3:41 PM
>
>Subject: Re: [R] create unique ID for each group
>
>
>
>If the ID="N/A" then when merge, there would be any match and can return N/A
>
>I use merge(dat1, dat2, by="UniqueID", all.x=TRUE),then an extra row will be added to the output for each case in dat1 that has no matching cases in dat2
>
>I just have to leave the records in dat1 even ID=N/A
>
>
>
>
>On Tue, May 7, 2013 at 12:38 PM, arun <smartpink111 at yahoo.com> wrote:
>
>Also, another problem might be where do you assign those rows with missing ID.  It could be the missing value for any ID.
>>For example in this case:
>>
>>dat1<- read.table(text="
>> ObsNumber     ID          Weight
>>      1                 0001         12
>>      2                 0001          13
>>      3                 0001           14
>>      4                  0002         16
>>       5                 0002         17
>>      6                   N/A          18  
>>     7                   0003         19
>>     8                   0003          20
>>
>> ",sep="",header=TRUE,colClass=c("numeric","character","numeric"),na.strings="N/A")
>> dat1
>>  ObsNumber   ID Weight
>>1         1 0001     12
>>2         2 0001     13
>>3         3 0001     14
>>4         4 0002     16
>>5         5 0002     17
>>6         6 <NA>     18
>>7         7 0003     19
>>8         8 0003     20
>>
>>
>> The missing ID could be either "0002" or "0003". 
>>
>>
>>
>>
>>
>>
>>----- Original Message -----
>>From: arun <smartpink111 at yahoo.com>
>>To: Ye Lin <yelin at lbl.gov>
>>Cc:
>>
>>Sent: Tuesday, May 7, 2013 3:32 PM
>>Subject: Re: [R] create unique ID for each group
>>
>>If you modify with na.strings="N/A", IDs with missing values will be read correctly.  Otherwise, it is just a character string.  BTW, if you need rows with NAs, then what will be your UniqueIDs you expect for those rows?
>>
>>
>>
>>
>>
>>
>>
>>________________________________
>>From: Ye Lin <yelin at lbl.gov>
>>To: arun <smartpink111 at yahoo.com>
>>Sent: Tuesday, May 7, 2013 3:25 PM
>>Subject: Re: [R] create unique ID for each group
>>
>>
>>
>>I do need rows with "NA". I already read the data in R, so do I assume I need to modify dat1 first with na.strings="N/A" ?
>>
>>
>>
>>On Tue, May 7, 2013 at 12:16 PM, arun <smartpink111 at yahoo.com> wrote:
>>
>>Hi,
>>>Do you need that row with "N/A".  The code I sent will remove that row.  If you don't use "na.strings="N/A", then it is not read NA, but some other character.  That is the reason you got results like:
>>>
>>>  unlist(lapply(split(dat1,dat1$ID),function(x) with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE)
>>>#[1] "0001_1" "0001_2" "0001_3" "0002_1" "0002_2" "N/A_1"
>>>
>>>
>>>
>>>----- Original Message -----
>>>From: arun <smartpink111 at yahoo.com>
>>>To: Ye Lin <yelin at lbl.gov>
>>>Cc: R help <r-help at r-project.org>
>>>
>>>Sent: Tuesday, May 7, 2013 3:13 PM
>>>Subject: Re: [R] create unique ID for each group
>>>
>>>HI Ye,
>>>
>>>For the NA in ID column,
>>>
>>>
>>>
>>>Hi
>>>dat1<- read.table(text="
>>>ObsNumber     ID          Weight
>>>     1                 0001         12
>>>     2                 0001          13
>>>     3                 0001           14
>>>     4                  0002         16
>>>      5                 0002         17
>>>     6                   N/A          18  
>>>",sep="",header=TRUE,colClass=c("numeric","character","numeric"),na.strings="N/A")
>>> unlist(lapply(split(dat1,dat1$ID),function(x) with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE)
>>>#[1] "0001_1" "0001_2" "0001_3" "0002_1" "0002_2"
>>>A.K.
>>>________________________________
>>>From: Ye Lin <yelin at lbl.gov>
>>>To: arun <smartpink111 at yahoo.com>
>>>Cc: R help <r-help at r-project.org>
>>>Sent: Tuesday, May 7, 2013 2:54 PM
>>>Subject: Re: [R] create unique ID for each group
>>>
>>>
>>>
>>>Thanks A.K. But I have "NA" in ID column, so when I apply the code, it gives me error saying the replacement as less rows than the data has. Anyway for ID=N/A, return sth like "N/A_1" in order as well?
>>>
>>>
>>>
>>>
>>>
>>>
>>>On Tue, May 7, 2013 at 11:17 AM, arun <smartpink111 at yahoo.com> wrote:
>>>
>>>H,
>>>>Sorry, a mistake:
>>>>dat1$UniqueID<-unlist(lapply(split(dat1,dat1$ID),function(x) with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE)
>>>>dat1
>>>> # ObsNumber   ID Weight UniqueID
>>>>#1         1 0001     12   0001_1
>>>>#2         2 0001     13   0001_2
>>>>#3         3 0001     14   0001_3
>>>>#4         4 0002     16   0002_1
>>>>#5         5 0002     17   0002_2
>>>>
>>>>dat2$UniqueID<-unlist(lapply(split(dat2,dat2$ID),function(x) with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE)
>>>>
>>>>A.K.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>----- Original Message -----
>>>>
>>>>From: arun <smartpink111 at yahoo.com>
>>>>To: Ye Lin <yelin at lbl.gov>
>>>>Cc: R help <r-help at r-project.org>
>>>>Sent: Tuesday, May 7, 2013 2:10 PM
>>>>Subject: Re: [R] create unique ID for each group
>>>>
>>>>
>>>>
>>>>Hi,
>>>>
>>>>Try this:
>>>>dat1<- read.table(text="
>>>>ObsNumber     ID          Weight
>>>>     1                 0001         12
>>>>     2                 0001          13
>>>>     3                 0001           14
>>>>     4                  0002         16
>>>>      5                 0002         17
>>>>",sep="",header=TRUE,colClass=c("numeric","character","numeric"))
>>>>dat2<- read.table(text="
>>>>ID               Height
>>>>0001            3.2
>>>>0001             2.6
>>>>0001             3.2
>>>>0002             2.2
>>>>0002              2.6
>>>>",sep="",header=TRUE,colClass=c("character","numeric"))
>>>>dat1$UniqueID<-with(dat1,as.character(interaction(ID,ObsNumber,sep="_")))
>>>> dat2$UniqueID<-with(dat2,as.character(interaction(ID,rownames(dat2),sep="_")))
>>>> dat2
>>>>#    ID Height UniqueID
>>>>#1 0001    3.2   0001_1
>>>>#2 0001    2.6   0001_2
>>>>#3 0001    3.2   0001_3
>>>>#4 0002    2.2   0002_4
>>>>#5 0002    2.6   0002_5
>>>>A.K.
>>>>
>>>>
>>>>
>>>>----- Original Message -----
>>>>From: Ye Lin <yelin at lbl.gov>
>>>>To: R help <r-help at r-project.org>
>>>>Cc:
>>>>Sent: Tuesday, May 7, 2013 1:54 PM
>>>>Subject: [R] create unique ID for each group
>>>>
>>>>Hey All,
>>>>
>>>>I have a dataset(dat1) like this:
>>>>
>>>>ObsNumber     ID          Weight
>>>>     1                 0001         12
>>>>     2                 0001          13
>>>>     3                 0001           14
>>>>     4                  0002         16
>>>>      5                 0002         17
>>>>
>>>>And another dataset(dat2) like this:
>>>>
>>>>ID               Height
>>>>0001            3.2
>>>>0001             2.6
>>>>0001             3.2
>>>>0002             2.2
>>>>0002              2.6
>>>>
>>>>I want to merge dat1 and dat2 based on "ID" in order, I know "match" only
>>>>returns the first match it finds. So I am thinking create unique ID col in
>>>>dat2 and dat2, then merge. But I dont know how to do that so it can be like
>>>>this:
>>>>
>>>>dat1:
>>>>
>>>>ObsNumber     ID          Weight  UniqueID
>>>>     1                 0001         12         0001_1
>>>>     2                 0001          13        0001_2
>>>>     3                 0001           14       0001_3
>>>>     4                  0002         16         0002_1
>>>>      5                 0002         17         0002_1
>>>>
>>>>dat2:
>>>>
>>>>ID               Height   UniqueID
>>>>0001            3.2          0001_1
>>>>0001             2.6         0001_2
>>>>0001             3.2         0001_3
>>>>0002             2.2         0002_1
>>>>0002              2.6        0002_2
>>>>
>>>>Or if it is possible to merge dat1 and dat2 by matching "ID" but return the
>>>>match in order that would be great!
>>>>
>>>>Thanks for your help!
>>>>
>>>>    [[alternative HTML version deleted]]
>>>>
>>>>______________________________________________
>>>>R-help at r-project.org mailing list
>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>
>>
>



More information about the R-help mailing list