[R] Text Input from a Non Delimited File

David Winsemius dwinsemius at comcast.net
Mon Feb 10 00:58:13 CET 2014


On Feb 9, 2014, at 2:48 PM, Burhan ul haq wrote:

> Hi,
> 
> I am trying to read in a file, which is not delimited by any specific
> characters.
> 
> Something as follows:
> ##  -------------------------------------------------------------------
Lines <- readLines(textConnection("GunPos RaceNo Name Gender Cat Club GunTime ChipPos ChipTime
1,10038, Carl Allwood M Sutton & Ashfield Harriers 02:38:40 1 02:38:40
2,10098, Adam Holland M Votwo/USN 02:41:25 2 02:41:25
3,13007, Pumlani Bangani M 02:43:23 3 02:43:23
4,10028, Anthony Jackson M Sittingbourne Striders 02:44:39 4 02:44:39
5,10187, Peter Stockdale M 02:45:26 5 02:45:25
6,10064, Jared Bethell M Harlow RC 02:46:43 6 02:46:40
7,13003, Sarah Harris F 35 Long Eaton RC 02:47:47 7 02:47:44
8,13009, Rod Harris M 02:47:47 8 02:47:45
9,10033, Carl Sommer M Huncote Harriers 02:47:59 9 02:47:58
10,10037, Peter Swaine M Charnwood AC 02:49:28 10 02:49:27
11,10048, Pavel Toropov M 02:50:41 11 02:50:41
12,10008, Derek Dunne M 45 Treasury Running Club 02:51:42 12 02:51:40
13,10044, Matthew Nutt M Scunthorpe 02:52:20 13 02:52:15
14,10380, Ludovic Renou M 02:53:37 14 02:53:34
15,10056, Alex Keenan M 02:53:48 15 02:53:47"))

Lines1 <- sub("( M | F )", ",\\1,", Lines)
Lines2 <- sub("( \\d+ )", ",\\1,", Lines1)

Need to edit header to have commas as separators.
You can then just use:

read.table (text=Lines2, sep=",", header=TRUE)

> 
> 
> As I failed to read it in via R or Excel, I used a text editor with
> regular expressions, sublime to be exact. I was trying to convert it
> in CSV format, and was successful to put commas for the first two
> entries, as follows:
> 
> ##  -------------------------------------------------------------------
> GunPos RaceNo Name Gender Cat Club GunTime ChipPos ChipTime
> 1,10038, Carl Allwood ,M ,Sutton & Ashfield Harriers 02:38:40 1 02:38:40
> 2,10098, Adam Holland ,M ,Votwo/USN 02:41:25 2 02:41:25
> 3,13007, Pumlani Bangani ,M ,02:43:23 3 02:43:23
> 4,10028, Anthony Jackson ,M ,Sittingbourne Striders 02:44:39 4 02:44:39
> 5,10187, Peter Stockdale ,M ,02:45:26 5 02:45:25
> 6,10064, Jared Bethell ,M ,Harlow RC 02:46:43 6 02:46:40
> 7,13003, Sarah Harris ,F ,35 Long Eaton RC 02:47:47 7 02:47:44
> 8,13009, Rod Harris ,M ,02:47:47 8 02:47:45
> 9,10033, Carl Sommer ,M ,Huncote Harriers 02:47:59 9 02:47:58
> 10,10037, Peter Swaine ,M ,Charnwood AC 02:49:28 10 02:49:27
> 11,10048, Pavel Toropov ,M ,02:50:41 11 02:50:41
> 12,10008, Derek Dunne ,M ,45 Treasury Running Club 02:51:42 12 02:51:40
> 13,10044, Matthew Nutt ,M ,Scunthorpe 02:52:20 13 02:52:15
> 14,10380, Ludovic Renou ,M ,02:53:37 14 02:53:34
> 15,10056, Alex Keenan ,M ,02:53:48 15 02:53:47
> ##  -------------------------------------------------------------------
> 
> I am failing after that, I tried to search the expression:
> (.)*(\d{2}:\d{2}:\d{2})( )
> and replace it with: \1,\2,\3, with the result:
> 
> ##  -------------------------------------------------------------------
> GunPos RaceNo Name Gender Cat Club GunTime ChipPos ChipTime
> ,02:38:40, 1 02:38:40
> ,02:41:25, 2 02:41:25
> ##  -------------------------------------------------------------------
> 
> How do I fix the regular expression here. If you examine the later
> entries some name contains hyphen, or have three parts, so other
> approaches do not work well.
> 
> Secondly, is there a better way to handle this problem. The original
> input file is in pdf format.I copied the text, and made a txt file out
> of it.
> 
> The input txt file is attached.
> 
> Thanks in advance for any suggestions.
> 
> 


David Winsemius
Alameda, CA, USA




More information about the R-help mailing list