[R] Locating the starting position of the first number in a string

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Mon Nov 2 22:33:56 CET 2015


Also not answering your question directly, but may be provide some useful 
ideas or results:

> library( gsubfn )
>
> DF <- setNames( data.frame( t( strapply( ID
+                                        , "^[^_]+_([A-Z]+)_([A-Z]+)([0-9]+)$"
+                                        , c
+                                        , simplify=TRUE
+                                        )
+                              )
+                           , stringsAsFactors = FALSE
+                           )
+               , c( "Type", "Group", "Number" )
+               )
> str( DF )
'data.frame':   100 obs. of  3 variables:
  $ Type  : chr  "MSM" "MSM" "MSM" "MSM" ...
  $ Group : chr  "HN" "HN" "HN" "HN" ...
  $ Number: chr  "01209" "01210" "01211" "10212" ...

On Tue, 3 Nov 2015, Peter Alspach wrote:

> Tena koe Jen
>
> Not answering your question: if you are after these locations in order to split the IDs in columns, then you might like to consider strsplit; e.g.,
>
> t(sapply(strsplit(ID, '_'), rbind))
>
> You could then split the last column.  You state that there is a 5-digit number at the end.  If this is correct, then use this feature (i.e., nchar(ID)-4) as you'd want "IBBS3_MSM_HN104213" (the fifth element in ID) to split to IBBS3, MSM, HN1 and 04213.  However, if it isn't always 5 digits then split at the first number (i.e., HN and 104213).
>
> HTH .....
>
> Peter Alspach
>
> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Jennifer Sabatier
> Sent: Tuesday, 3 November 2015 7:39 a.m.
> To: r-help at r-project.org
> Subject: [R] Locating the starting position of the first number in a string
>
> Hi,
>
>
> So, I've got a vector of strings that look like this:
> ID <- c("IBBS3_MSM_HN01209","IBBS3_MSM_HN01210","IBBS3_MSM_HN01211",
> "IBBS3_MSM_HN10212","IBBS3_MSM_HN104213","IBBS3_MSM_HN10214",
> "IBBS3_MSM_HN44215","IBBS3_MSM_HN44216","IBBS3_MSM_HN44217",
> "IBBS3_MSM_HN44218","IBBS3_MSM_HN44219","IBBS3_MSM_HN44220",
> "IBBS3_MSM_HN44221","IBBS3_MSM_HN44222","IBBS3_MSM_HN44223",
> "IBBS3_MSM_HN44224","IBBS3_MSM_HN44225","IBBS3_MSM_HN44226",
> "IBBS3_MSM_HN44227","IBBS3_MSM_HN12228","IBBS3_MSM_HN12229",
> "IBBS3_MSM_HN12230","IBBS3_MSM_HN12231","IBBS3_MSM_HN12232",
> "IBBS3_MSM_HN12233","IBBS3_MSM_HN12234","IBBS3_MSM_HN12235",
> "IBBS3_MSM_HN12236","IBBS3_MSM_HN12237","IBBS3_MSM_HN12238",
> "IBBS3_MSM_HN12239","IBBS3_MSM_HN12240","IBBS3_MSM_HN12241",
> "IBBS3_MSM_HN12242","IBBS3_MSM_HN12243","IBBS3_MSM_HN12244",
> "IBBS3_MSM_HN12245","IBBS3_MSM_HN12246","IBBS3_MSM_HN12247",
> "IBBS3_MSM_HN12248","IBBS3_MSM_HN12249","IBBS3_MSM_HN12250",
> "IBBS3_MSM_HN12251","IBBS3_MSM_HN12252","IBBS3_MSM_HN12253",
> "IBBS3_MSM_HN12254","IBBS3_MSM_HN12255","IBBS3_MSM_HN25256",
> "IBBS3_MSM_HN25257","IBBS3_MSM_HN25258","IBBS3_MSM_HN25259",
> "IBBS3_MSM_HN25260","IBBS3_MSM_HN25261","IBBS3_MSM_HN25262",
> "IBBS3_MSM_HN25263","IBBS3_MSM_HN25264","IBBS3_MSM_HN25265",
> "IBBS3_MSM_HN25266","IBBS3_MSM_HN25267","IBBS3_MSM_HN25268",
> "IBBS3_MSM_HN25269","IBBS3_MSM_HN25270","IBBS3_MSM_HN25271",
> "IBBS3_MSM_HN25272","IBBS3_MSM_HN25273","IBBS3_MSM_HN25274",
> "IBBS3_MSM_HN25275","IBBS3_MSM_HN25276", "IBBS3_MSM_HN25277", "IBBS3_MSM_HN25278","IBBS3_MSM_HN25279","IBBS3_MSM_HN25280",
> "IBBS3_MSM_HN25281","IBBS3_MSM_HN25282","IBBS3_MSM_HN25283",
> "IBBS3_MSM_HN25284","IBBS3_MSM_HMC44285",  "IBBS3_MSM_HMC44286", "IBBS3_MSM_HMC44287","IBBS3_MSM_HMC44288","IBBS3_MSM_HMC44289",
> "IBBS3_MSM_HMC44290","IBBS3_MSM_HMC44291","IBBS3_MSM_HMC44292",
> "IBBS3_MSM_HMC44293","IBBS3_MSM_HMC44294","IBBS3_MSM_HMC44295",
> "IBBS3_MSM_HMC44296","IBBS3_MSM_HMC44297","IBBS3_MSM_HMC44298",
> "IBBS3_MSM_HMC44299","IBBS3_MSM_HMC44300","IBBS3_MSM_HMC44301",
> "IBBS3_MSM_HMC44302","IBBS3_MSM_HMC44303","IBBS3_MSM_HMC44304",
> "IBBS3_MSM_HMC44305","IBBS3_MSM_HMC44306","IBBS3_MSM_HMC44307",
> "IBBS3_MSM_HMC44309")
>
>
>
>
> This is an ID that is in the following format:  IBBS3_Type_Group#####
>
>
> What I want to do is locate the starting position of Type, which is anywhere from 3 to 4 letters long (in this example it's either MSM or PWID), the starting position of Group which is 2-3 letters long (either HN or HMC), and finally the starting position of the 5-digit number.
>
>
> I'm able to get Type and Group using the following:
>
>
> TYPE_s <- sapply(c("MSM", "PWID"), regexpr, ID, ignore.case=T)
>
> GROUP_s <- (sapply(c("HN", "HMC"), regexpr, ID, ignore.case=T))
>
>
> What I am having trouble with is getting the starting position of the 5-digit number.
>
>
> I am trying:
>
>
> DIGITS_s <- sapply("([0:9])", regexpr, ID, ignore.case=T)
>
>
> But that just seems to look for the position of the first 0.:
>
>
>> DIGITS_s
>
>       ([0:9])
>
>  [1,]      13
>
>  [2,]      13
>
>  [3,]      13
>
>  [4,]      14
>
>  [5,]      14
>
>  [6,]      14
>
>  [7,]      -1
>
>  [8,]      -1
>
>  [9,]      -1
>
> [10,]      -1
>
> [11,]      17
>
> [12,]      17
>
> [13,]      -1
>
> [14,]      -1
>
> [15,]      -1
>
> [16,]      -1
>
> [17,]      -1
>
> [18,]      -1
>
> [19,]      -1
>
> [20,]      -1
>
> [21,]      17
>
> [22,]      17
>
> [23,]      -1
>
> [24,]      -1
>
> [25,]      -1
>
> [26,]      -1
>
> [27,]      -1
>
> [28,]      -1
>
> [29,]      -1
>
> [30,]      -1
>
> [31,]      17
>
> [32,]      17
>
> [33,]      -1
>
> [34,]      -1
>
> [35,]      -1
>
> [36,]      -1
>
> [37,]      -1
>
> [38,]      -1
>
> [39,]      -1
>
> [40,]      -1
>
> [41,]      17
>
> [42,]      17
>
> [43,]      -1
>
> [44,]      -1
>
> [45,]      -1
>
> [46,]      -1
>
> [47,]      -1
>
> [48,]      -1
>
> [49,]      -1
>
> [50,]      -1
>
> [51,]      17
>
> [52,]      17
>
> [53,]      -1
>
> [54,]      -1
>
> [55,]      -1
>
> [56,]      -1
>
> [57,]      -1
>
> [58,]      -1
>
> [59,]      -1
>
> [60,]      -1
>
> [61,]      17
>
> [62,]      17
>
> [63,]      -1
>
> [64,]      -1
>
> [65,]      -1
>
> [66,]      -1
>
> [67,]      -1
>
> [68,]      -1
>
> [69,]      -1
>
> [70,]      -1
>
> [71,]      17
>
> [72,]      17
>
> [73,]      -1
>
> [74,]      -1
>
> [75,]      -1
>
> [76,]      -1
>
> [77,]      -1
>
> [78,]      -1
>
> [79,]      -1
>
> [80,]      -1
>
> [81,]      18
>
> [82,]      17
>
> [83,]      17
>
> [84,]      17
>
> [85,]      17
>
> [86,]      17
>
> [87,]      17
>
> [88,]      17
>
> [89,]      17
>
> [90,]      17
>
> [91,]      17
>
> [92,]      17
>
> [93,]      17
>
> [94,]      17
>
> [95,]      17
>
> [96,]      17
>
> [97,]      17
>
> [98,]      17
>
> [99,]      17
>
> [100,]      17
>
>
> So, clearly, this is wrong.  I just would like to find the starting position of the first digit, no matter what it is.
>
> It's probably easy, isn't it?
>
> Best,
>
> Jen
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> The contents of this e-mail are confidential and may be ...{{dropped:14}}
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k



More information about the R-help mailing list