[R] Regular expressions on filenames

David Winsemius dwinsemius at comcast.net
Thu Jan 16 02:56:59 CET 2014


On Jan 15, 2014, at 4:37 PM, Fisher Dennis wrote:

> R 3.0.2
> OS X
> 
> Colleagues
> 
> I am writing code to read a large number of files in a particular folder.  In some situations, there may be two versions of the file with different extensions, e.g.:
> 	FILE.csv
> 	FILE.xls
> I extracted the portion before the extension with:
> 	sub("\\..*$", "", basename(FILELIST))
> then used 
> 	duplicated
> to find duplicates.  All was well until I encountered files named:
> 	FILE.XXX.csv
> 	FILE.YYY.xls
> 
> My regular expression extracted only the “FILE” portion of the text and claimed that the filenames (without the extensions) matched.  Can someone provide me with the appropriate regular expression to deal with this?  Thanks.

Why not:

sub("\\..{3}$", "", basename(FILELIST))

See ?regex

-- 

David Winsemius
Alameda, CA, USA




More information about the R-help mailing list