[R] how to separate string from numbers in a large txt file

Michael Boulineau m|ch@e|@p@bou||ne@u @end|ng |rom gm@||@com
Thu May 16 01:07:47 CEST 2019


I have a wild and crazy text file, the head of which looks like this:

2016-07-01 02:50:35 <john> hey
2016-07-01 02:51:26 <jane> waiting for plane to Edinburgh
2016-07-01 02:51:45 <john> thinking about my boo
2016-07-01 02:52:07 <jane> nothing crappy has happened, not really
2016-07-01 02:52:20 <john> plane went by pretty fast, didn't sleep
2016-07-01 02:54:08 <jane> no idea what time it is or where I am really
2016-07-01 02:54:17 <john> just know it's london
2016-07-01 02:56:44 <jane> you are probably asleep
2016-07-01 02:58:45 <jane> I hope fish was fishy in a good eay
2016-07-01 02:58:56 <jone> 💘
2016-07-01 02:59:34 <jane> 🍑🍑🍑
2016-07-01 03:02:48 <john> British security is a little more rigorous...

It goes on for a while. It's a big file. But I feel like it's going to
be difficult to annotate with the coreNLP library or package. I'm
doing natural language processing. In other words, I'm curious as to
how I would shave off the dates, that is, to make it look like:

<john> hey
<jane> waiting for plane to Edinburgh
 <john> thinking about my boo
<jane> nothing crappy has happened, not really
<john> plane went by pretty fast, didn't sleep
<jane> no idea what time it is or where I am really
<john> just know it's london
<jane> you are probably asleep
<jane> I hope fish was fishy in a good eay
 <jone> 💘
<jane> 🍑🍑🍑
<john> British security is a little more rigorous...

To be clear, then, I'm trying to clean a large text file by writing a
regular expression? such that I create a new object with no numbers or
dates.

Michael



More information about the R-help mailing list