[R] sub/grep question: extract year

Marc Girondot m@rc@g|rondot @end|ng |rom u-p@ud@|r
Thu Aug 9 11:17:28 CEST 2018


I answer myself to the third point:
This pattern is better :

pattern.year <- ".*\\b(18|19|20)([0-9][0-9])\\b.*"

subtext <- "bla 1880 bla"
sub(pattern.year, "\\1\\2", subtext) # return 1880
subtext <- "bla 1980 bla"
sub(pattern.year, "\\1\\2", subtext) # return 1980
subtext <- "bla 2010 bla"
sub(pattern.year, "\\1\\2", subtext) # return 2010
subtext <- "bla 1010 bla"
sub(pattern.year, "\\1\\2", subtext) # return bla 1010 bla
subtext <- "bla 3010 bla"
sub(pattern.year, "\\1\\2", subtext) # return bla 3010 bla

Marc

Le 09/08/2018 à 09:57, Marc Girondot via R-help a écrit :
> Hi everybody,
>
> I have some questions about the way that sub is working. I hope that 
> someone has the answer:
>
> 1/ Why the second example does not return an empty string ? There is 
> no match.
>
> subtext <- "-1980-"
> sub(".*(1980).*", "\\1", subtext) # return 1980
> sub(".*(1981).*", "\\1", subtext) # return -1980-
>
> 2/ Based on sub documentation, it replaces the first occurence of a 
> pattern: why it does not return 1980 ?
>
> subtext <- " 1980 1981 "
> sub(".*(198[01]).*", "\\1", subtext) # return 1981
>
> 3/ I want extract year from text; I use:
>
> subtext <- "bla 1980 bla"
> sub(".*[ \\.\\(-]([12][01289][0-9][0-9])[ \\.\\)-].*", "\\1", subtext) 
> # return 1980
> subtext <- "bla 2010 bla"
> sub(".*[ \\.\\(-]([12][01289][0-9][0-9])[ \\.\\)-].*", "\\1", subtext) 
> # return 2010
>
> but
>
> subtext <- "bla 1010 bla"
> sub(".*[ \\.\\(-]([12][01289][0-9][0-9])[ \\.\\)-].*", "\\1", subtext) 
> # return 1010
>
> I would like exclude the case 1010 and other like this.
>
> The solution would be:
>
> 18[0-9][0-9] or 19[0-9][0-9] or 200[0-9] or 201[0-9]
>
> Is there a solution to write such a pattern in grep ?
>
> Thanks a lot
>
> Marc
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
__________________________________________________________
Marc Girondot, Pr

Laboratoire Ecologie, Systématique et Evolution
Equipe de Conservation des Populations et des Communautés
CNRS, AgroParisTech et Université Paris-Sud 11 , UMR 8079
Bâtiment 362
91405 Orsay Cedex, France

Tel:  33 1 (0)1.69.15.72.30   Fax: 33 1 (0)1.69.15.73.53
e-mail: marc.girondot using u-psud.fr
Web: http://www.ese.u-psud.fr/epc/conservation/Marc.html
Skype: girondot




More information about the R-help mailing list