[R] Problems with time formats when importing data using readHTMLTable

Cristina Silva csilva at ipma.pt
Tue Jul 11 15:25:26 CEST 2017


Dear Jeff,

I am sorry, I didn't notice that it was not plain text. I hope that it 
is now in the correct format. I explain the problem again, now with more 
detais.
I am collecting the track positions of our research vessel from 
www.marinetraffic.com.  In the page, the data appear in a table:

Timestamp     Source     Speed (kn)     Latitude (°) Longitude (°)     
Course (°)     Show on Map
2017-07-11 10:57 (UTC)    Terr-AIS     8.6    37.36228 -9.176811    200
2017-07-11 10:54 (UTC)    Terr-AIS     8.4    37.36875 -9.174048    200
2017-07-11 09:50 (UTC)    Terr-AIS     8    37.51499 -9.184502    149
2017-07-11 09:08 (UTC)    Terr-AIS     7.7    37.60513 -9.228263    169
2017-07-11 09:05 (UTC)    Terr-AIS     8    37.61168 -9.229627    167
2017-07-11 09:03 (UTC)    Terr-AIS     6.9    37.61626 -9.230456    132
2017-07-11 08:59 (UTC)    Terr-AIS     3.4    37.61758 -9.237928    101
2017-07-11 08:57 (UTC)    Terr-AIS     3.6    37.61808 -9.240235    116
2017-07-11 08:47 (UTC)    Terr-AIS     3.3    37.62267 -9.249762    112
2017-07-11 08:44 (UTC)    Terr-AIS     3.2    37.6241 -9.25348    115
2017-07-11 08:40 (UTC)    Terr-AIS     3.3    37.62557 -9.257295    112
2017-07-11 08:32 (UTC)    Terr-AIS     3.4    37.62903 -9.266028    111
2017-07-11 08:28 (UTC)    Terr-AIS     3.3    37.63051 -9.269725    114
2017-07-11 08:25 (UTC)    Terr-AIS     3.3    37.63207 -9.273642    120
2017-07-11 08:20 (UTC)    Terr-AIS     3.6    37.63395 -9.278254    118
2017-07-11 08:16 (UTC)    Terr-AIS     3.3    37.63586 -9.282853    116
2017-07-11 08:08 (UTC)    Terr-AIS     3.2    37.6392 -9.291157    118
2017-07-11 07:38 (UTC)    Sat-AIS
2017-07-11 06:58 (UTC)    Terr-AIS     3    37.68459 -9.321658    159
2017-07-11 06:53 (UTC)    Terr-AIS     3.3    37.68881 -9.324172    150

To get this table I had to copy manually from the webpage, paste in a 
text file, delete information on figures, links, etc. and then import to 
R. This works well, I read the csv file and have no problems with date 
formats. But instead of 20 records, I have to import e.g. 500 records, 
i.e. 10 pages of 50 records each, or more. I want to import these data 
in a more automated and quick procedure. I have done this before with no 
problems, but probably the webpage changed some permissions.

Here is the script used to get the data directly from the webpage into 
R, for a sample of 20 records (10 per page):

library(XML)
x <- list()
for (i in 1:2)
{
     x[i]<- 
readHTMLTable(paste('http://www.marinetraffic.com/en/ais/index/positions/all/shipid:318358/mmsi:263601000/shipname:NORUEGA/per_page:10/page:', 
i, sep=''))
}

ais <- do.call('rbind', x)
ais <- ais[,-7]

and I got the following table:

> ais
                                      Timestamp Source                   
       Speed (kn) Latitude (°) Longitude (°) Course (°)
1            149977066014997706604 minutes ago Terr-AIS                 
                8.6      37.36228 -9.176811         200
2            149977048714997704876 minutes ago Terr-AIS                 
                8.4      37.36875 -9.174048         200
3   149976661414997666141 hour, 11 minutes ago Terr-AIS                 
                8.0      37.51499 -9.184502         149
4   149976410714997641071 hour, 53 minutes ago Terr-AIS                 
                7.7      37.60513 -9.228263         169
5   149976392714997639271 hour, 56 minutes ago Terr-AIS                 
                8.0      37.61168 -9.229627         167
6   149976378014997637801 hour, 58 minutes ago Terr-AIS                 
                6.9      37.61626 -9.230456         132
7   149976354014997635402 hours, 2 minutes ago Terr-AIS                 
                3.4      37.61758 -9.237928         101
8   149976342014997634202 hours, 4 minutes ago Terr-AIS                 
                3.6      37.61808 -9.240235         116
9  149976286114997628612 hours, 14 minutes ago Terr-AIS                 
                3.3      37.62267 -9.249762         112
10 149976264714997626472 hours, 17 minutes ago Terr-AIS                 
                3.2       37.6241 -9.25348         115
11 149976243014997624302 hours, 21 minutes ago Terr-AIS                 
                3.3      37.62557 -9.257295         112
12 149976193714997619372 hours, 29 minutes ago Terr-AIS                 
                3.4      37.62903 -9.266028         111
13 149976172814997617282 hours, 32 minutes ago Terr-AIS                 
                3.3      37.63051 -9.269725         114
14 149976150714997615072 hours, 36 minutes ago Terr-AIS                 
                3.3      37.63207 -9.273642         120
15 149976124714997612472 hours, 40 minutes ago Terr-AIS                 
                3.6      37.63395 -9.278254         118
16 149976098714997609872 hours, 45 minutes ago Terr-AIS                 
                3.3      37.63586 -9.282853         116
17 149976051014997605102 hours, 53 minutes ago Terr-AIS                 
                3.2       37.6392 -9.291157         118
18 149975870414997587043 hours, 23 minutes ago  Sat-AIS Add to SAT 
Fleet for undelayed data          <NA> <NA>        <NA>
19  149975631414997563144 hours, 3 minutes ago Terr-AIS                 
                3.0      37.68459 -9.321658         159
20  149975598814997559884 hours, 8 minutes ago Terr-AIS                 
                3.3      37.68881 -9.324172         150

The records correspond exactly to the first table. I have tried to 
change the colClasses to character, but the results are always the same. 
My question is: how to change the Timestamp to a POSIXlt POSIXct format, 
either in the importing phase or within R?

I hope that the question is clear now.

Cristina


On Mon, 10 Jul 2017 09:18:05 -0700, Jeff Newmiller wrote:
> Not reproducible. [1][2][3] If our answers don't seem to apply to
> your situation, it will likely be because you did not explain your
> question clearly.
>
> Not plain text. This is a plain text mailing list,  and the best-case
> scenario when you let your email program send HTML is that what you
> saw is not what we see (worst case is your email is scrambled on our
> end).
>
> Have you read the documentation for the function you are using? In
> particular, what about the colClasses argument? If you don't let
> readHTMLTable guess what the format is (have it read in as character
> data) then you have a fighting chance to get it right yourself, e.g.
>
> as.POSIXct( "2017-07-10 14:04 (UTC)", format="%Y-%m-%d %H:%M (UTC)",
> tz="UTC" )
>
> -----
>
> [1]
> 
> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
>
> [2] http://adv-r.had.co.nz/Reproducibility.html
>
> [3] https://cran.r-project.org/web/packages/reprex/index.html
> --
> Sent from my phone. Please excuse my brevity.
>
> On July 10, 2017 8:31:30 AM PDT, Cristina Silva <csilva at ipma.pt> 
> wrote:
>>Hi,
>>
>>I am extracting positions data from the marine traffic website. The
>>table has a "Timestamp" column which, in the browser, appears with 
>> the
>>format yyyy-mm-dd HH:MM (UTC), e.g. 2017-07-10 14:04 (UTC).
>>
>>When I import the table, the same date "2017-07-10 14:04 (UTC)" 
>> appears
>>
>>as "1499696500149969650021 minutes ago", This is the more recent date
>>and time. Older records, as e.g. "2017-07-09 17:02 (UTC)" appear as
>>e.g.
>>"1499619726149961972621 hours, 59 minutes ago".
>>
>>I don't know how to convert these data to the time formats used in R
>>(POSIXct).
>>
>>The script is very simple and worked before:
>>
>>library(XML)
>>x <- readHTMLTable('url')
>>
>>where the 'url' is the link to the website with the specification of
>>the
>>vessel.
>>
>>I appreciate any help.
>>
>>Cristina
>>
>>--
>>Cristina Silva
>>Divisão de Modelação e Gestão de Recursos Pesqueiros
>>Av. Dr. Alfredo Magalhães Ramalho
>>1495-165 Lisboa
>>@: csilva at ipma.pt <mailto:csilva at ipma.pt>
>>#: +351 213027096 <phoneto:+351213027096>
>>
>>
>>	[[alternative HTML version deleted]]
>>
>>______________________________________________
>>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.

-- 
Cristina Silva
Divisão de Modelação e Gestão de Recursos Pesqueiros
Avenida de Brasília
1449-006 Lisboa
@: csilva at ipma.pt
#: +351 213027096



More information about the R-help mailing list