[R] import file formatted RFC-822

Sebastian Kruk residuo.solow at gmail.com
Tue Apr 13 19:26:40 CEST 2010


Dear R-list users:

I would like to import a database of web robots,
http://www.robotstxt.org/db/all.txt, it´s formatted RFC-822, ¿how can
I do it?

The RFC 822 specification defines a standard format for electronic
messages, which consists of a set of header fields and an optional
body. The headers contain information about the message, such as to
whom it is being sent, from whom it is being sent, when it was sent,
the subject, and so on. The body, if present, is separated from the
header fields by an empty line (\r\n). The following is an example of
a simple message in this format:

From: example en example.com
To: example2 en example.com
Subject: As basic as it gets

This is the plain text body of the message.  Note the blank line
between the header information and the body of the message.

Both the headers and the optional body must consist only of US-ASCII
characters. Each header consists of a name and a value that is
separated by a colon character. The header name must consist of
printable US-ASCII characters. The header value can span one or two
lines (folded or unfolded) and is terminated by a carriage return and
line feed sequence followed by a non-white space character.

The RFC 822 format is the basis for many other more specific message
formats. The following example is a message in USENET article (or
message) format specified by RFC 850, which is based on RFC 822:

From: example en example.com
Subject: As basic as it gets
Newsgroups: comp.microsoft.test

This is the plain text body of the message.  Note the blank line
between the header information and the body of the message.

Thanks,

Sebastián.



More information about the R-help mailing list