[R] Checking for similar file names in two different directories

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Fri Dec 27 05:22:55 CET 2019


AHA! -- I think I now see what you mean.

My previous suggestion was almost useless as it assumes you already know
what the "common" parts are ... but you don't.

However, if it is the filename parts at the end are separated by spaces
from the preceding part of the filename, i.e. like "stuff xxxxxxx.xls",
then something like the following example would work I think:

## Read in *all* the filenames from both directories as I previously
suggested.

Gfiles <- list.files("G:")
Qfiles <- list.files("Q:")

Suppose this gave you (a simplified example):

> Gfiles
 [1] "kjqdx 157.xls" "aorgz 287.xls" "ioldc 380.xls" "fpnxr 509.xls"
 [5] "wytcg 853.xls" "xujos 964.xls" "xdeto 217.xls" "nqriu 574.xls"
 [9] "jclir 480.xls" "fndyu 769.xls"
> Qfiles
 [1] "vexrb 509.xls" "jxeio 770.xls" "zhmwf 920.xls" "cajdq 287.xls"
 [5] "nwdic 259.xls" "sqjkb 889.xls" "brhfu 157.xls" "uyirq 574.xls"
 [9] "ijfqm 480.xls" "nedhj 982.xls"

## all that's important is the " xxx.xls" at the end
## extract the filename part, omitting the ".xls" using regex's
> Gnm <- sub("^.+ (.+)\\.xls$","\\1",Gfiles)
> Qnm <- sub("^.+ (.+)\\.xls$","\\1",Qfiles)

> Gnm
 [1] "157" "287" "380" "509" "853" "964" "217" "574" "480" "769"
> Qnm
 [1] "509" "770" "920" "287" "259" "889" "157" "574" "480" "982"

> ## The 'common' parts are:
> intersect(Gnm,Qnm)
[1] "157" "287" "509" "574" "480"

You can now use these as I described previously to extract your common
files.

A similar strategy can be used for any other definition of "common" you
wish to use *provided* you can uniquely and specifically define "common" to
match  in the filenames.


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Thu, Dec 26, 2019 at 9:54 AM Thomas Subia <tsubia using imgprecision.com>
wrote:

> Colleagues,
>
> I have two locations where my data resides.
> One folder is for data taken under treatment A
> One folder is for data taken under treatment B
>
> "G:\ 0020-49785 10806.xls"
> "Q:\ 301864 4519 10806.xls"
>
> Here the 10806 is the part which is common to both directories.
>
> Is there a way to have R extract parts common to both directories?
>
> Thomas Subia
> Statistician / Senior Quality Engineer
> ASQ CQE
>
> IMG Companies
> 225 Mountain Vista Parkway
> Livermore, CA 94551
> T. (925) 273-1106
> F. (925) 273-1111
> E. tsubia using imgprecision.com
>
>
> Precision Manufacturing for Emerging Technologies
> imgprecision.com
>
> The contents of this message, together with any attachments, are intended
> only for the use of the individual or entity to which they are addressed
> and may contain information that is legally privileged, confidential and
> exempt from disclosure. If you are not the intended recipient, you are
> hereby notified that any dissemination, distribution, or copying of this
> message, or any attachment, is strictly prohibited. If you have received
> this message in error, please notify the original sender or IMG Companies,
> LLC at Tel: 925-273-1100 immediately by telephone or by return E-mail and
> delete this message, along with any attachments, from your computer. Thank
> you.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list