[R] regex

Wed Sep 18 13:13:39 CEST 2019

A little note on quoting in regular expressions.
I find writing \\. when I want a quoted . somewhat confusing,
so I would use the pattern "_w_.*[.]csv$".

Better still, if you want to match file names,
there is a function glob2rx that converts shell ("glob")
patterns into regular expression patterns.  Thus
> grep(glob2rx("*_w_*.csv"), myfiles, value=TRUE)
[1] "BU-072_1_E1_RE_SEC-01_local_w_0.2_0.2.csv"
[2] "BU-072_1_E1_RE_SEC-01_local_w_0.2_0.6.csv"
[3] "BU-072_1_E1_RE_SEC-01_local_w_0.4_1.0.csv"
[4] "BU-072_1_E1_RE_SEC-01_local_w_1.0_0.2.csv"
[5] "BU-072_1_E1_RE_SEC-01_local_w_1.0_0.6.csv"
[6] "BU-072_1_E1_RE_SEC-01_local_w_1.0_1.0.csv"

So the simplest way to get what you want is
CSVs <- list.files(path=..., pattern=glob2rx("*_w_*.csv"))

In fact ?list.files mentions glob2rx.

On Tue, 17 Sep 2019 at 18:49, Ivan Calandra <calandra using rgzm.de> wrote:

> Dear useRs,
>
> I still have problems using regular expressions. I have two problems for
> which I have found workarounds, but I'm sure there are better ways of
> doing it.
>
> 1) list CSV files with "_w_" in the name
>
> Here is a sample of the files in the folder:
> myfiles <- c("BU-072_1_E1_RE_SEC-01_local_a_0.2_0.2.csv",
> "BU-072_1_E1_RE_SEC-01_local_a_0.2_0.6.csv","BU-072_1_E1_RE_SEC-01_local_a_0.4_1.0.csv",
>
> "BU-072_1_E1_RE_SEC-01_local_a_1.0_0.2.csv","BU-072_1_E1_RE_SEC-01_local_a_1.0_0.6.csv",
>
> "BU-072_1_E1_RE_SEC-01_local_w_0.2_0.2.csv","BU-072_1_E1_RE_SEC-01_local_w_0.2_0.6.csv",
>
> "BU-072_1_E1_RE_SEC-01_local_w_0.4_1.0.csv","BU-072_1_E1_RE_SEC-01_local_w_1.0_0.2.csv",
>
> "BU-072_1_E1_RE_SEC-01_local_w_1.0_0.6.csv","BU-072_1_E1_RE_SEC-01_local_w_1.0_1.0.csv",
>
> "BU-072_1_E1_RE_SEC-01_local_a_0.2_0.2.xls","BU-072_1_E1_RE_SEC-01_local_a_0.2_0.6.xls",
>
> "BU-072_1_E1_RE_SEC-01_local_a_0.4_1.0.xls","BU-072_1_E1_RE_SEC-01_local_a_1.0_0.2.xls",
>
> "BU-072_1_E1_RE_SEC-01_local_a_1.0_0.6.xls","BU-072_1_E1_RE_SEC-01_local_w_0.2_0.2.xls",
>
> "BU-072_1_E1_RE_SEC-01_local_w_0.2_0.6.xls","BU-072_1_E1_RE_SEC-01_local_w_0.4_1.0.xls",
>
> "BU-072_1_E1_RE_SEC-01_local_w_1.0_0.2.xls","BU-072_1_E1_RE_SEC-01_local_w_1.0_0.6.xls",
>
> "BU-072_1_E1_RE_SEC-01_local_w_1.0_1.0.xls")
>
> Here is what I did: CSVs <- list.files(path=..., pattern="\\.csv$")
> w.files <- CSVs[grep(pattern="_w_", CSVs)]
>
> Of course, what I would like to do is list only the interesting files
> from the beginning, rather than subsetting the whole list of files. In
> other words, having a pattern that includes both "\\.csv$" and "_w_" in
> the list.files() call. I tried "_w_&\\.csv$" but it returns an empty
> vector.
>
> 2) The units of the variables are given in the original headers. I would
> like to extract the units. This is what I did: headers <- c("dist to
> origin on curve [mm]","segment on section [mm]", "angle 1 [degree]",
> "angle 2 [degree]","angle 3 [degree]") units.var <-
> gsub(pattern="^.*\\[|\\]$", "", headers)
>
> It seems to be to overly complicated using gsub(). Isn't there a way to
> extract what is interesting rather than deleting what is not?
>
> Thank you for your help! Best, Ivan
>
> --
> Dr. Ivan Calandra
> TraCEr, laboratory for Traceology and Controlled Experiments
> MONREPOS Archaeological Research Centre and
> Museum for Human Behavioural Evolution
> Schloss Monrepos
> 56567 Neuwied, Germany
> +49 (0) 2631 9772-243
> https://www.researchgate.net/profile/Ivan_Calandra
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]