[R] Regular expression \ String Extraction help

(Ted Harding) Ted.Harding at manchester.ac.uk
Wed Jun 3 15:55:31 CEST 2009


On 03-Jun-09 11:34:16, Tony Breyal wrote:
> Dear all,
> Is there a good way of doing the following conversion:
> 
> [YYYY]-[MM]-[DD] [Time] [Day] [Name][Integer].[Extention]
> 
> to become
> 
> C:\test\[Name]\[YYYY]-[MM]-[DD] [Time] [Day]\[YYYY]-[MM]-[DD] [Time]
> [Day] [Name][Integer].[Extention]
> 
> i.e. these
> 
> 2009-04-10 1400 Fri Foo1.txt
> 2009-04-10 1400 Fri Universities2.txt
> 2009-04-10 1400 Fri Hitchhikers Guide To The Galaxy42.txt
> 
> will become
> 
> C:\test\Foo\2009-04-10 1400 Fri Foo1.txt
> C:\test\Universities\2009-04-10 1400 Fri Universities2.txt
> C:\test\Hitchhikers Guide To The Galaxy\2009-04-10 1400 Fri
> Hitchhikers Guide To The Galaxy42.txt
> 
> My main issue is the conversion for 'Hitchkikers Guide To The
> Galaxy54' because of the spaces in the Name. So far this is what i
> have:
> 
>> txt <- '2009-04-10 1400 Fri Universities1.txt'
>> step1 <- unlist(strsplit(txt, '\\.'))
>> step2 <- unlist(strsplit(step1[1], ' '))
>> Name <- gsub('[0-9]',replacement='', step2[4])
>> step3 <- paste(step2[1], step2[2], step2[3], sep=' ')
>> paste('C:\\test\\', Name, '\\', step3, '\\', txt, sep='' )
> [1] "C:\\test\\Universities\\2009-04-10 1400 Fri\\2009-04-10 1400 Fri
> Universities1.txt"
> 
> Cheers,
> Tony Breyal

I can get as far as the following (using the "Hitchhikers" one):

   txt <- '2009-04-10 1400 Fri Hitchhikers Guide To The Galaxy42.txt'
   step1 <- unlist(strsplit(txt, '\\.'))
   step2 <- unlist(strsplit(step1[1], ' '))
   step2
   # [1] "Hitchhikers" "Guide" "To" "The" "Galaxy42"   

What is now needed is to join all the separate elements of step2
into a single character string. paste() won't do it, because it
produces a separate character string for each element of step2.
cat() won't do it because it has no value (so cannot be assigned).

You could loop over step2:

   Name1 <- step2[4]
   for(i in (5:length(step2))) Name1 <- paste(Name1,step2[i])
   Name1
   # [1] "Hitchhikers Guide To The Galaxy42"

Then do your gsub:

   Name <- gsub('[0-9]',replacement='', Name1)
   step3 <- paste(step2[1], step2[2], step2[3], sep=' ')
   paste('C:\\test\\', Name, '\\', step3, '\\', txt, sep='' )

[1] "C:\\test\\Hitchhikers Guide To The Galaxy\\2009-04-10 1400
Fri\\2009-04-10 1400 Fri Hitchhikers Guide To The Galaxy42.txt"

So that works; but it would be nice to be able to avoid the loop!

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 03-Jun-09                                       Time: 14:55:28
------------------------------ XFMail ------------------------------




More information about the R-help mailing list