[R] COVID-19 datasets...

Thomas Petzoldt thpe @end|ng |rom @|meco|@de
Thu May 7 13:53:17 CEST 2020


On 07.05.2020 at 13:12 Deepayan Sarkar wrote:
> On Thu, May 7, 2020 at 4:16 PM Thomas Petzoldt <thpe using simecol.de> wrote:
>> On 07.05.2020 at 11:19 Deepayan Sarkar wrote:
>>> On Thu, May 7, 2020 at 12:58 AM Thomas Petzoldt <thpe using simecol.de> wrote:
>>>> Sorry if I'm joining a little bit late.
>>>>
>>>> I've put some related links and scripts together a few weeks ago. Then I
>>>> stopped with this, because there is so much.
>>>>
>>>> The data format employed by John Hopkins CSSE was sort of a big surprise
>>>> to me.
>>> Why? I find it quite convenient to drop the first few columns and
>>> extract the data as a matrix (using data.matrix()).
>>>
>>> -Deepayan
>> Many thanks for the hint to use data.matrix
>>
>> My aim was not to say that it is difficult, especially as R has all the
>> tools for data mangling.
>>
>> My surprise was that "wide tables" and non-ISO dates as column names are
>> not the "data base way" that we in general teach to our students
> Well, I am all for long format data when it makes sense, but I would
> disagree that that is always the "right approach". In the case of
> regular multiple time series, as in this context, a matrix-like
> structure seems much more natural (and nicely handled by ts() in R),
> and I wouldn't even bother reshaping the data in the first place.
>
> See, for example,
>
> https://github.com/deepayan/deepayan.github.io/blob/master/covid-19/deaths.rmd
>
> and
>
> https://deepayan.github.io/covid-19/deaths.html
>
> -Deepayan

Great, thank you for the link with the comprehensive lattice graphs and 
the explanations. I like your package very much and use it often, since 
it appeared on CRAN (3 of my CRAN packages depend on it). As "dynamic 
modeller", I consider time always as the first column, but I agree on 
the other hand, that long tables are often, but not always the right 
approach, let's think about gridded multi-dimensional netcdf data.

Many thanks for sharing your analysis publicly, I'll add your repo to my 
link list.

Thomas

>> With reshape2::melt or tidyr::gather resp. pivot_longer, conversion is
>> quite easy, regardless if one wants to use tidyverse or not, see example
>> below.
>>
>> Again, thanks, Thomas
>>
>>
>> library("dplyr")
>> library("readr")
>> library("tidyr")
>>
>> file <-
>> "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"
>>
>> dat <- read_delim(file, delim=",")
>> names(dat)[1:2] <- c("Province_State", "Country_Region")
>> dat2 <-
>>     dat %>%
>>     ## summarize Country/Region duplicates
>>     group_by(Country_Region) %>% summarise_at(vars(-(1:4)), sum) %>%
>>     ## make it a long table
>>     pivot_longer(cols = -Country_Region, names_to = "time") %>%
>>     ## convert to ISO 8601 date
>>     mutate(time = as.POSIXct(time, format="%m/%e/%y"))
>>
>>
>>
>>>> An opposite approach was taken in Germany, that organized it as a
>>>> big JSON trees.
>>>>
>>>> Fortunately, both can be "tidied" with R, and represent good didactic
>>>> examples for our students.
>>>>
>>>> Here yet another repo linking to the data:
>>>>
>>>> https://github.com/tpetzoldt/covid
>>>>
>>>>
>>>> Thomas
>>>>
>>>>
>>>> On 04.05.2020 at 20:48 James Spottiswoode wrote:
>>>>> Sure. COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University is available here:
>>>>>
>>>>> https://github.com/CSSEGISandData/COVID-19
>>>>>
>>>>> All in csv fiormat.
>>>>>
>>>>>
>>>>>> On May 4, 2020, at 11:31 AM, Bernard McGarvey <mcgarvey.bernard using comcast.net> wrote:
>>>>>>
>>>>>> Just curious does anyone know of a website that has data available in a format that R can download and analyze?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>> Bernard McGarvey
>>>>>>
>>>>>>
>>>>>> Director, Fort Myers Beach Lions Foundation, Inc.
>>>>>>
>>>>>>
>>>>>> Retired (Lilly Engineering Fellow).
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>> James Spottiswoode
>>>>> Applied Mathematics & Statistics
>>>>> (310) 270 6220
>>>>> jamesspottiswoode Skype
>>>>> james using jsasoc.com
>>>>>
>> --
>> Dr. Thomas Petzoldt
>> senior scientist
>>
>> Technische Universitaet Dresden
>> Faculty of Environmental Sciences
>> Institute of Hydrobiology
>> 01062 Dresden, Germany
>>
>> https://tu-dresden.de/Members/thomas.petzoldt



More information about the R-help mailing list