[R] how to separate string from numbers in a large txt file

David Winsemius dw|n@em|u@ @end|ng |rom comc@@t@net
Thu May 16 05:47:47 CEST 2019


On 5/15/19 4:07 PM, Michael Boulineau wrote:
> I have a wild and crazy text file, the head of which looks like this:
>
> 2016-07-01 02:50:35 <john> hey
> 2016-07-01 02:51:26 <jane> waiting for plane to Edinburgh
> 2016-07-01 02:51:45 <john> thinking about my boo
> 2016-07-01 02:52:07 <jane> nothing crappy has happened, not really
> 2016-07-01 02:52:20 <john> plane went by pretty fast, didn't sleep
> 2016-07-01 02:54:08 <jane> no idea what time it is or where I am really
> 2016-07-01 02:54:17 <john> just know it's london
> 2016-07-01 02:56:44 <jane> you are probably asleep
> 2016-07-01 02:58:45 <jane> I hope fish was fishy in a good eay
> 2016-07-01 02:58:56 <jone> 💘
> 2016-07-01 02:59:34 <jane> 🍑🍑🍑
> 2016-07-01 03:02:48 <john> British security is a little more rigorous...

Looks entirely not-"crazy". Typical log file format.

Two possibilities: 1) Use `read.fwf` from pkg foreign; 2) Use regex 
(i.e. the sub-function) to strip everything up to the "<". Read 
`?regex`. Since that's not a metacharacters you could use a pattern 
".+<" and replace with "".

And do read the Posting Guide. Cross-posting to StackOverflow and Rhelp, 
at least within hours of each, is considered poor manners.


-- 

David.

>
> It goes on for a while. It's a big file. But I feel like it's going to
> be difficult to annotate with the coreNLP library or package. I'm
> doing natural language processing. In other words, I'm curious as to
> how I would shave off the dates, that is, to make it look like:
>
> <john> hey
> <jane> waiting for plane to Edinburgh
>   <john> thinking about my boo
> <jane> nothing crappy has happened, not really
> <john> plane went by pretty fast, didn't sleep
> <jane> no idea what time it is or where I am really
> <john> just know it's london
> <jane> you are probably asleep
> <jane> I hope fish was fishy in a good eay
>   <jone> 💘
> <jane> 🍑🍑🍑
> <john> British security is a little more rigorous...
>
> To be clear, then, I'm trying to clean a large text file by writing a
> regular expression? such that I create a new object with no numbers or
> dates.
>
> Michael
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list