[R] Create sequential vector for values in another column

arun smartpink111 at yahoo.com
Tue Apr 1 21:03:11 CEST 2014


Hi,
May be this helps:
set.seed(14)
dat1 <- data.frame(shell_ID= sample(c("0208A_47_33","0208A_47_34","0912C_13_3","1400C_2_48"),20,replace=TRUE),stringsAsFactors=FALSE)
dat2 <- dat1
ord1 <- order(as.numeric(gsub("[[:alpha:]]+.*","",dat1$shell_ID)),as.numeric(gsub(".*\\_","",dat1$shell_ID)) ) 

dat1 <-  dat1[ord1,,drop=FALSE]
row.names(dat1) <- 1:nrow(dat1)
#or
library(gtools)
dat2$shell_ID <- mixedsort(dat2$shell_ID) 

identical(dat1,dat2)
#[1] TRUE 

dat1$x <- as.numeric(factor(dat1$shell_ID))
dat1 
#or

dat2$x <- match(dat1$shell_ID,unique(dat1$shell_ID)) 

all.equal(dat1,dat2)
#[1] TRUE 

A.K. 


Hi all, I am trying to do a similar thing however I would like the second vector to read as follows. shell_ID                     X
0208A_47_33             1
0208A_47_33             1
0208A_47_33             1
0208A_47_34             2
0208A_47_34             2
0208A_47_34             2
0208A_47_34             2
0208A_47_34             2
0208A_47_34             2
0208A_47_34             2
0912C_13_3               3
0912C_13_3               3
0912C_13_3               3
1400C_2_48               4
1400C_2_48               4
1400C_2_48               4
1400C_2_48               4
1400C_2_48               4
1400C_2_48               4
1400C_2_48               4
1400C_2_48               4
1400C_2_48               4 However the shell_ID's may not be in any particular order as I am already using a subset of data based on another variable in R I am not familiar with how to check that the shell_IDs are sorted. The subset contains 21,005 unique shell_ID's. Thanks
Helen 





More information about the R-help mailing list