Develop a Shiny app that visualizes the progression of the 2019-20 Global Coronavirus Outbreak.
If you are ambitious, bring in visualization of other types of data, e.g., stock market, tweets, contrast with previous outbreak (e.g., 2003 SARS, 2009 H1N1, 2012 MERS, 2014 Ebola), prediction from your statistical/epidemiological model, and so on.
Publish your Shiny app to https://www.shinyapps.io and share the link.
(Optinal) Submit your app to the 2020 Shiny Contest (deadline 20 March 2020 at 5pm ET).
Below is some data exploration, which may help you get started. You do not have to use it. Note below code may change frequently.
Since Feb 11, JHU CSSE stops posting data on Google Sheets.
Johns Hopkins University (JHU) Center for Systems Science and Engineering (CSSE) kindly compiles, updates, and shares the current data on 2019-nCoV outbreak at Google Sheets: https://docs.google.com/spreadsheets/d/1UF2pSkFTURko2OvfHWWlFpDFAr1UxCBA4JLwlSP6KFo.
We can use the tidyverse package googlesheets4
to import data from Google Sheets. Because the JHU sheets are public, we don’t need authentication by tokens.
library(googlesheets4)
# no authentication
sheets_deauth()
Import the confirmed
sheet:
(confirmed <-
read_sheet("1UF2pSkFTURko2OvfHWWlFpDFAr1UxCBA4JLwlSP6KFo", sheet = 2))
Import the recovered
sheet:
(recovered <-
read_sheet("1UF2pSkFTURko2OvfHWWlFpDFAr1UxCBA4JLwlSP6KFo", sheet = 3))
Import the death
sheet:
(death <-
read_sheet("1UF2pSkFTURko2OvfHWWlFpDFAr1UxCBA4JLwlSP6KFo", sheet = 4))
Since Feb 11, JHU SSE is hosting 2019 nCoV data on GitHub repo: https://github.com/CSSEGISandData/2019-nCoV.
Let’s import the time series data directly from the csv file on GitHub:
(confirmed <- read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv"))
## Parsed with column specification:
## cols(
## .default = col_double(),
## `Province/State` = col_character(),
## `Country/Region` = col_character()
## )
## See spec(...) for full column specifications.
## # A tibble: 76 x 33
## `Province/State` `Country/Region` Lat Long `1/22/20` `1/23/20` `1/24/20`
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Anhui Mainland China 31.8 117. 1 9 15
## 2 Beijing Mainland China 40.2 116. 14 22 36
## 3 Chongqing Mainland China 30.1 108. 6 9 27
## 4 Fujian Mainland China 26.1 118. 1 5 10
## 5 Gansu Mainland China 36.1 104. 0 2 2
## 6 Guangdong Mainland China 23.3 113. 26 32 53
## 7 Guangxi Mainland China 23.8 109. 2 5 23
## 8 Guizhou Mainland China 26.8 107. 1 3 3
## 9 Hainan Mainland China 19.2 110. 4 5 8
## 10 Hebei Mainland China 38.0 115. 1 1 2
## # … with 66 more rows, and 26 more variables: `1/25/20` <dbl>, `1/26/20` <dbl>,
## # `1/27/20` <dbl>, `1/28/20` <dbl>, `1/29/20` <dbl>, `1/30/20` <dbl>,
## # `1/31/20` <dbl>, `2/1/20` <dbl>, `2/2/20` <dbl>, `2/3/20` <dbl>,
## # `2/4/20` <dbl>, `2/5/20` <dbl>, `2/6/20` <dbl>, `2/7/20` <dbl>,
## # `2/8/20` <dbl>, `2/9/20` <dbl>, `2/10/20` <dbl>, `2/11/20` <dbl>,
## # `2/12/20` <dbl>, `2/13/20` <dbl>, `2/14/20` <dbl>, `2/15/20` <dbl>,
## # `2/16/20` <dbl>, `2/17/20` <dbl>, `2/18/20` <dbl>, `2/19/20` <dbl>
(recovered <- read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Recovered.csv"))
## Parsed with column specification:
## cols(
## .default = col_double(),
## `Province/State` = col_character(),
## `Country/Region` = col_character()
## )
## See spec(...) for full column specifications.
## # A tibble: 76 x 33
## `Province/State` `Country/Region` Lat Long `1/22/20` `1/23/20` `1/24/20`
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Anhui Mainland China 31.8 117. 0 0 0
## 2 Beijing Mainland China 40.2 116. 0 0 1
## 3 Chongqing Mainland China 30.1 108. 0 0 0
## 4 Fujian Mainland China 26.1 118. 0 0 0
## 5 Gansu Mainland China 36.1 104. 0 0 0
## 6 Guangdong Mainland China 23.3 113. 0 2 2
## 7 Guangxi Mainland China 23.8 109. 0 0 0
## 8 Guizhou Mainland China 26.8 107. 0 0 0
## 9 Hainan Mainland China 19.2 110. 0 0 0
## 10 Hebei Mainland China 38.0 115. 0 0 0
## # … with 66 more rows, and 26 more variables: `1/25/20` <dbl>, `1/26/20` <dbl>,
## # `1/27/20` <dbl>, `1/28/20` <dbl>, `1/29/20` <dbl>, `1/30/20` <dbl>,
## # `1/31/20` <dbl>, `2/1/20` <dbl>, `2/2/20` <dbl>, `2/3/20` <dbl>,
## # `2/4/20` <dbl>, `2/5/20` <dbl>, `2/6/20` <dbl>, `2/7/20` <dbl>,
## # `2/8/20` <dbl>, `2/9/20` <dbl>, `2/10/20` <dbl>, `2/11/20` <dbl>,
## # `2/12/20` <dbl>, `2/13/20` <dbl>, `2/14/20` <dbl>, `2/15/20` <dbl>,
## # `2/16/20` <dbl>, `2/17/20` <dbl>, `2/18/20` <dbl>, `2/19/20` <dbl>
(death <- read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Deaths.csv"))
## Parsed with column specification:
## cols(
## .default = col_double(),
## `Province/State` = col_character(),
## `Country/Region` = col_character()
## )
## See spec(...) for full column specifications.
## # A tibble: 76 x 33
## `Province/State` `Country/Region` Lat Long `1/22/20` `1/23/20` `1/24/20`
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Anhui Mainland China 31.8 117. 0 0 0
## 2 Beijing Mainland China 40.2 116. 0 0 0
## 3 Chongqing Mainland China 30.1 108. 0 0 0
## 4 Fujian Mainland China 26.1 118. 0 0 0
## 5 Gansu Mainland China 36.1 104. 0 0 0
## 6 Guangdong Mainland China 23.3 113. 0 0 0
## 7 Guangxi Mainland China 23.8 109. 0 0 0
## 8 Guizhou Mainland China 26.8 107. 0 0 0
## 9 Hainan Mainland China 19.2 110. 0 0 0
## 10 Hebei Mainland China 38.0 115. 0 1 1
## # … with 66 more rows, and 26 more variables: `1/25/20` <dbl>, `1/26/20` <dbl>,
## # `1/27/20` <dbl>, `1/28/20` <dbl>, `1/29/20` <dbl>, `1/30/20` <dbl>,
## # `1/31/20` <dbl>, `2/1/20` <dbl>, `2/2/20` <dbl>, `2/3/20` <dbl>,
## # `2/4/20` <dbl>, `2/5/20` <dbl>, `2/6/20` <dbl>, `2/7/20` <dbl>,
## # `2/8/20` <dbl>, `2/9/20` <dbl>, `2/10/20` <dbl>, `2/11/20` <dbl>,
## # `2/12/20` <dbl>, `2/13/20` <dbl>, `2/14/20` <dbl>, `2/15/20` <dbl>,
## # `2/16/20` <dbl>, `2/17/20` <dbl>, `2/18/20` <dbl>, `2/19/20` <dbl>
I want to tidy data into the long format. pivot_longer
is the modern version of gather
function in dplyr.
confirmed_long <- confirmed %>%
pivot_longer(-(`Province/State`:Long),
names_to = "Date",
values_to = "confirmed") %>%
mutate(Date = (mdy(Date))) # convert string to date-time
confirmed_long
## # A tibble: 2,204 x 6
## `Province/State` `Country/Region` Lat Long Date confirmed
## <chr> <chr> <dbl> <dbl> <date> <dbl>
## 1 Anhui Mainland China 31.8 117. 2020-01-22 1
## 2 Anhui Mainland China 31.8 117. 2020-01-23 9
## 3 Anhui Mainland China 31.8 117. 2020-01-24 15
## 4 Anhui Mainland China 31.8 117. 2020-01-25 39
## 5 Anhui Mainland China 31.8 117. 2020-01-26 60
## 6 Anhui Mainland China 31.8 117. 2020-01-27 70
## 7 Anhui Mainland China 31.8 117. 2020-01-28 106
## 8 Anhui Mainland China 31.8 117. 2020-01-29 152
## 9 Anhui Mainland China 31.8 117. 2020-01-30 200
## 10 Anhui Mainland China 31.8 117. 2020-01-31 237
## # … with 2,194 more rows
recovered_long <- recovered %>%
pivot_longer(-(`Province/State`:Long),
names_to = "Date",
values_to = "recovered") %>%
mutate(Date = mdy(Date))
recovered_long
## # A tibble: 2,204 x 6
## `Province/State` `Country/Region` Lat Long Date recovered
## <chr> <chr> <dbl> <dbl> <date> <dbl>
## 1 Anhui Mainland China 31.8 117. 2020-01-22 0
## 2 Anhui Mainland China 31.8 117. 2020-01-23 0
## 3 Anhui Mainland China 31.8 117. 2020-01-24 0
## 4 Anhui Mainland China 31.8 117. 2020-01-25 0
## 5 Anhui Mainland China 31.8 117. 2020-01-26 0
## 6 Anhui Mainland China 31.8 117. 2020-01-27 0
## 7 Anhui Mainland China 31.8 117. 2020-01-28 0
## 8 Anhui Mainland China 31.8 117. 2020-01-29 2
## 9 Anhui Mainland China 31.8 117. 2020-01-30 2
## 10 Anhui Mainland China 31.8 117. 2020-01-31 3
## # … with 2,194 more rows
death_long <- death %>%
pivot_longer(-(`Province/State`:Long),
names_to = "Date",
values_to = "death") %>%
mutate(Date = mdy(Date))
death_long
## # A tibble: 2,204 x 6
## `Province/State` `Country/Region` Lat Long Date death
## <chr> <chr> <dbl> <dbl> <date> <dbl>
## 1 Anhui Mainland China 31.8 117. 2020-01-22 0
## 2 Anhui Mainland China 31.8 117. 2020-01-23 0
## 3 Anhui Mainland China 31.8 117. 2020-01-24 0
## 4 Anhui Mainland China 31.8 117. 2020-01-25 0
## 5 Anhui Mainland China 31.8 117. 2020-01-26 0
## 6 Anhui Mainland China 31.8 117. 2020-01-27 0
## 7 Anhui Mainland China 31.8 117. 2020-01-28 0
## 8 Anhui Mainland China 31.8 117. 2020-01-29 0
## 9 Anhui Mainland China 31.8 117. 2020-01-30 0
## 10 Anhui Mainland China 31.8 117. 2020-01-31 0
## # … with 2,194 more rows
ncov_tbl <- confirmed_long %>%
left_join(recovered_long) %>%
left_join(death_long) %>%
pivot_longer(confirmed:death,
names_to = "Case",
values_to = "Count")
## Joining, by = c("Province/State", "Country/Region", "Lat", "Long", "Date")
## Joining, by = c("Province/State", "Country/Region", "Lat", "Long", "Date")
ncov_tbl %>% print(width = Inf)
## # A tibble: 6,612 x 7
## `Province/State` `Country/Region` Lat Long Date Case Count
## <chr> <chr> <dbl> <dbl> <date> <chr> <dbl>
## 1 Anhui Mainland China 31.8 117. 2020-01-22 confirmed 1
## 2 Anhui Mainland China 31.8 117. 2020-01-22 recovered 0
## 3 Anhui Mainland China 31.8 117. 2020-01-22 death 0
## 4 Anhui Mainland China 31.8 117. 2020-01-23 confirmed 9
## 5 Anhui Mainland China 31.8 117. 2020-01-23 recovered 0
## 6 Anhui Mainland China 31.8 117. 2020-01-23 death 0
## 7 Anhui Mainland China 31.8 117. 2020-01-24 confirmed 15
## 8 Anhui Mainland China 31.8 117. 2020-01-24 recovered 0
## 9 Anhui Mainland China 31.8 117. 2020-01-24 death 0
## 10 Anhui Mainland China 31.8 117. 2020-01-25 confirmed 39
## # … with 6,602 more rows
Other sources of data:
- Tencent (in Chinese): https://news.qq.com/zt2020/page/feiyan.htm
- R package nCov2019
; install by remotes::install_github("GuangchuangYu/nCov2019")
- Shiny app coronavirus
Some useful resources for mapping:
- The book Geocomputation with R. Especially Chapter 8 Making Maps with R.
- The book An Introduction to Spatial Analysis and Mapping in R.
The package mapdata
contains a data china
. But the map seems out-dated. For example, Chongqing is sill a part of Sichuan province in this map.
suppressMessages(library(mapdata))
map("china", col = "gray40", ylim = c(18,54))
Download GADM data for China at https://www.gadm.org/download_country_v3.html. There are 4 levels of details to plot. We opt for the level 1 which is at the province level.
library(sf)
library(tidyverse)
gadm36_CHN_1_sf <- readRDS("gadm36_CHN_0_sf.rds")
ggplot(data = gadm36_CHN_1_sf) +
geom_sf()
library("GADMTools")
map <- gadm_sp_loadCountries("CHN", level = 1, basefile = "./")
gadm_plot(map) + theme_light()
library(maptools)
map <- rgdal::readOGR("gadm36_CHN_shp/gadm36_CHN_1.shp")
plot(map)
A self note about installing sf
package on CentOS 7: https://gist.github.com/Hua-Zhou/6c11babe35437ce1ea8e4893a14d07c8.
Donwload China GIS data from here, unzip, and put in the working directory.
ls -l bou2_4p.*
## -rwxrwxrwx@ 1 huazhou staff 86283 Apr 10 1999 bou2_4p.dbf
## -rwxrwxrwx@ 1 huazhou staff 1508752 Apr 10 1999 bou2_4p.shp
## -rwxrwxrwx@ 1 huazhou staff 7500 Apr 10 1999 bou2_4p.shx
Read in the shape file into simple feature (SF) format. Replace the NA
in NAME
by Macau.
library(sf)
## Linking to GEOS 3.7.2, GDAL 2.4.2, PROJ 5.2.0
chn_map <- st_read("./bou2_4p.shp", as_tibble = TRUE) %>%
mutate(NAME = iconv(NAME, from = "GBK"),
BOU2_4M_ = as.integer(BOU2_4M_),
BOU2_4M_ID = as.integer(BOU2_4M_ID)) %>%
mutate(NAME = str_replace_na(NAME, replacement = "澳门特别行政区")) %>%
print()
## Reading layer `bou2_4p' from data source `/Users/huazhou/Documents/github.com/ucla-biostat203b-2020winter.github.io/hw/hw3/bou2_4p.shp' using driver `ESRI Shapefile'
## Simple feature collection with 925 features and 7 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 73.44696 ymin: 6.318641 xmax: 135.0858 ymax: 53.55793
## epsg (SRID): NA
## proj4string: NA
## Simple feature collection with 925 features and 7 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 73.44696 ymin: 6.318641 xmax: 135.0858 ymax: 53.55793
## epsg (SRID): NA
## proj4string: NA
## # A tibble: 925 x 8
## AREA PERIMETER BOU2_4M_ BOU2_4M_ID ADCODE93 ADCODE99 NAME
## * <dbl> <dbl> <int> <int> <int> <int> <chr>
## 1 54.4 68.5 2 23 230000 230000 黑龙江省…
## 2 129. 130. 3 15 150000 150000 内蒙古自…
## 3 176. 84.9 4 65 650000 650000 新疆维吾…
## 4 21.3 41.2 5 22 220000 220000 吉林省…
## 5 15.6 38.4 6 21 210000 210000 辽宁省…
## 6 41.5 76.8 7 62 620000 620000 甘肃省…
## 7 19.5 44.9 8 13 130000 130000 河北省…
## 8 1.73 8.50 9 11 110000 110000 北京市…
## 9 0 0.088 10 292 210000 210000 辽宁省…
## 10 0 0.047 11 292 210000 210000 辽宁省…
## # … with 915 more rows, and 1 more variable: geometry <POLYGON>
There are about 34 provinces in China, why there are 925 areas?
chn_map %>%
count(NAME) %>%
print(n = Inf)
## Simple feature collection with 34 features and 2 fields
## geometry type: GEOMETRY
## dimension: XY
## bbox: xmin: 73.44696 ymin: 6.318641 xmax: 135.0858 ymax: 53.55793
## epsg (SRID): NA
## proj4string: NA
## # A tibble: 34 x 3
## NAME n geometry
## * <chr> <int> <GEOMETRY>
## 1 上海市 12 MULTIPOLYGON (((121.414 30.6952, 121.415 30.69756, 121.41…
## 2 云南省 1 POLYGON ((98.65588 28.83075, 98.65788 28.8333, 98.66165 2…
## 3 内蒙古自治区 1 POLYGON ((121.4884 53.33265, 121.4974 53.32104, 121.5004 …
## 4 北京市 1 POLYGON ((117.496 40.66534, 117.4992 40.64039, 117.4938 4…
## 5 台湾省 57 MULTIPOLYGON (((120.8155 21.75681, 120.8174 21.75765, 120…
## 6 吉林省 1 POLYGON ((123.171 46.24668, 123.2186 46.26945, 123.2394 4…
## 7 四川省 1 POLYGON ((102.447 33.56613, 102.4386 33.57738, 102.4011 3…
## 8 天津市 1 POLYGON ((117.4638 40.24003, 117.4962 40.22717, 117.5416 …
## 9 宁夏回族自治区… 1 POLYGON ((106.8099 39.30777, 106.7962 39.30038, 106.8025 …
## 10 安徽省 1 POLYGON ((116.3686 34.64087, 116.4237 34.65374, 116.432 3…
## 11 山东省 86 MULTIPOLYGON (((120.7183 36.28086, 120.7147 36.28513, 120…
## 12 山西省 1 POLYGON ((114.1232 40.74159, 114.2001 40.65104, 114.1953 …
## 13 广东省 154 MULTIPOLYGON (((109.9309 20.22808, 109.9309 20.22664, 109…
## 14 广西壮族自治区… 6 MULTIPOLYGON (((109.2133 20.89708, 109.2109 20.89759, 109…
## 15 新疆维吾尔自治区… 1 POLYGON ((96.38329 42.72696, 96.35991 42.70969, 96.09664 …
## 16 江苏省 5 MULTIPOLYGON (((119.4393 34.77124, 119.4419 34.77665, 119…
## 17 江西省 1 POLYGON ((116.1283 29.82727, 116.1726 29.83686, 116.2121 …
## 18 河北省 9 MULTIPOLYGON (((117.194 40.08278, 117.1983 40.0796, 117.2…
## 19 河南省 1 POLYGON ((113.7235 36.35595, 113.7516 36.37056, 113.8081 …
## 20 浙江省 179 MULTIPOLYGON (((120.4666 27.16607, 120.468 27.16719, 120.…
## 21 海南省 79 MULTIPOLYGON (((113.2364 6.323439, 113.2353 6.32, 113.233…
## 22 湖北省 1 POLYGON ((109.757 32.91204, 109.7823 33.00224, 109.7863 3…
## 23 湖南省 1 POLYGON ((111.2695 30.01506, 111.2882 30.01272, 111.2936 …
## 24 澳门特别行政区… 1 POLYGON ((114.2259 22.54496, 114.226 22.54444, 114.2263 2…
## 25 甘肃省 1 POLYGON ((96.38329 42.72696, 96.38308 42.72533, 96.39072 …
## 26 福建省 168 MULTIPOLYGON (((119.9835 25.96752, 119.986 25.96805, 119.…
## 27 西藏自治区 1 POLYGON ((79.03262 34.33298, 79.07736 34.41547, 79.15108 …
## 28 贵州省 2 MULTIPOLYGON (((109.5375 26.75003, 109.571 26.76731, 109.…
## 29 辽宁省 94 MULTIPOLYGON (((121.7979 40.99618, 121.7931 40.98311, 121…
## 30 重庆市 1 POLYGON ((108.5145 32.21204, 108.5308 32.195, 108.5808 32…
## 31 陕西省 1 POLYGON ((111.1508 39.54331, 111.1104 39.50884, 111.1047 …
## 32 青海省 1 POLYGON ((96.98931 38.82063, 96.98743 38.8617, 96.95775 3…
## 33 香港特别行政区… 53 MULTIPOLYGON (((113.9175 22.15853, 113.9029 22.15769, 113…
## 34 黑龙江省 1 POLYGON ((121.4884 53.33265, 121.4995 53.33601, 121.5184 …
ggplot2
can plot SF data using geom_sf
function:
chn_map %>%
ggplot() +
geom_sf(mapping = aes(geometry = geometry), color = "black", fill = "white") +
#geom_sf_label(mapping = aes(label = NAME)) +
theme_bw() # better for maps
These are the provinces available from the JHU sheets.
ncov_tbl %>%
filter(`Country/Region` %in% c("Mainland China", "Macau", "Hong Kong", "Taiwan")) %>%
distinct(`Province/State`, `Country/Region`) %>%
print(n = Inf)
## # A tibble: 34 x 2
## `Province/State` `Country/Region`
## <chr> <chr>
## 1 Anhui Mainland China
## 2 Beijing Mainland China
## 3 Chongqing Mainland China
## 4 Fujian Mainland China
## 5 Gansu Mainland China
## 6 Guangdong Mainland China
## 7 Guangxi Mainland China
## 8 Guizhou Mainland China
## 9 Hainan Mainland China
## 10 Hebei Mainland China
## 11 Heilongjiang Mainland China
## 12 Henan Mainland China
## 13 Hubei Mainland China
## 14 Hunan Mainland China
## 15 Inner Mongolia Mainland China
## 16 Jiangsu Mainland China
## 17 Jiangxi Mainland China
## 18 Jilin Mainland China
## 19 Liaoning Mainland China
## 20 Ningxia Mainland China
## 21 Qinghai Mainland China
## 22 Shaanxi Mainland China
## 23 Shandong Mainland China
## 24 Shanghai Mainland China
## 25 Shanxi Mainland China
## 26 Sichuan Mainland China
## 27 Tianjin Mainland China
## 28 Tibet Mainland China
## 29 Xinjiang Mainland China
## 30 Yunnan Mainland China
## 31 Zhejiang Mainland China
## 32 Taiwan Taiwan
## 33 Macau Macau
## 34 Hong Kong Hong Kong
In order to join the tibbles ncov_tbl
and chn_map
, we need to use the province name as key. Let’s create a function to translate Chinese province name to English.
translate <- function(x) {
sapply(x, function(chn_name) {
if (str_detect(chn_name, "澳门")) {
eng_name <- "Macau"
} else if (str_detect(chn_name, "台湾")) {
eng_name <- "Taiwan"
} else if (str_detect(chn_name, "上海")) {
eng_name <- "Shanghai"
} else if (str_detect(chn_name, "云南")) {
eng_name <- "Yunnan"
} else if (str_detect(chn_name, "内蒙古")) {
eng_name <- "Inner Mongolia"
} else if (str_detect(chn_name, "北京")) {
eng_name <- "Beijing"
} else if (str_detect(chn_name, "台湾")) {
eng_name <- "Taiwan"
} else if (str_detect(chn_name, "吉林")) {
eng_name <- "Jilin"
} else if (str_detect(chn_name, "四川")) {
eng_name <- "Sichuan"
} else if (str_detect(chn_name, "天津")) {
eng_name <- "Tianjin"
} else if (str_detect(chn_name, "宁夏")) {
eng_name <- "Ningxia"
} else if (str_detect(chn_name, "安徽")) {
eng_name <- "Anhui"
} else if (str_detect(chn_name, "山东")) {
eng_name <- "Shandong"
} else if (str_detect(chn_name, "山西")) {
eng_name <- "Shanxi"
} else if (str_detect(chn_name, "广东")) {
eng_name <- "Guangdong"
} else if (str_detect(chn_name, "广西")) {
eng_name <- "Guangxi"
} else if (str_detect(chn_name, "新疆")) {
eng_name <- "Xinjiang"
} else if (str_detect(chn_name, "江苏")) {
eng_name <- "Jiangsu"
} else if (str_detect(chn_name, "江西")) {
eng_name <- "Jiangxi"
} else if (str_detect(chn_name, "河北")) {
eng_name <- "Hebei"
} else if (str_detect(chn_name, "河南")) {
eng_name <- "Henan"
} else if (str_detect(chn_name, "浙江")) {
eng_name <- "Zhejiang"
} else if (str_detect(chn_name, "海南")) {
eng_name <- "Hainan"
} else if (str_detect(chn_name, "湖北")) {
eng_name <- "Hubei"
} else if (str_detect(chn_name, "湖南")) {
eng_name <- "Hunan"
} else if (str_detect(chn_name, "甘肃")) {
eng_name <- "Gansu"
} else if (str_detect(chn_name, "福建")) {
eng_name <- "Fujian"
} else if (str_detect(chn_name, "西藏")) {
eng_name <- "Tibet"
} else if (str_detect(chn_name, "贵州")) {
eng_name <- "Guizhou"
} else if (str_detect(chn_name, "辽宁")) {
eng_name <- "Liaoning"
} else if (str_detect(chn_name, "重庆")) {
eng_name <- "Chongqing"
} else if (str_detect(chn_name, "陕西")) {
eng_name <- "Shanxi"
} else if (str_detect(chn_name, "青海")) {
eng_name <- "Qinghai"
} else if (str_detect(chn_name, "香港")) {
eng_name <- "Hong Kong"
} else if (str_detect(chn_name, "黑龙江")) {
eng_name <- "Heilongjiang"
} else {
eng_name <- chn_name # don't translate if no correspondence
}
return(eng_name)
})
}
Create a new variable NAME_ENG
:
chn_prov <- chn_map %>%
count(NAME) %>%
mutate(NAME_ENG = translate(NAME)) # translate function is vectorized
chn_prov %>% print(n = Inf)
## Simple feature collection with 34 features and 3 fields
## geometry type: GEOMETRY
## dimension: XY
## bbox: xmin: 73.44696 ymin: 6.318641 xmax: 135.0858 ymax: 53.55793
## epsg (SRID): NA
## proj4string: NA
## # A tibble: 34 x 4
## NAME n geometry NAME_ENG
## * <chr> <int> <GEOMETRY> <chr>
## 1 上海市 12 MULTIPOLYGON (((121.414 30.6952, 121.415 30.6975… Shanghai
## 2 云南省 1 POLYGON ((98.65588 28.83075, 98.65788 28.8333, 9… Yunnan
## 3 内蒙古自治区… 1 POLYGON ((121.4884 53.33265, 121.4974 53.32104, … Inner Mon…
## 4 北京市 1 POLYGON ((117.496 40.66534, 117.4992 40.64039, 1… Beijing
## 5 台湾省 57 MULTIPOLYGON (((120.8155 21.75681, 120.8174 21.7… Taiwan
## 6 吉林省 1 POLYGON ((123.171 46.24668, 123.2186 46.26945, 1… Jilin
## 7 四川省 1 POLYGON ((102.447 33.56613, 102.4386 33.57738, 1… Sichuan
## 8 天津市 1 POLYGON ((117.4638 40.24003, 117.4962 40.22717, … Tianjin
## 9 宁夏回族自治区… 1 POLYGON ((106.8099 39.30777, 106.7962 39.30038, … Ningxia
## 10 安徽省 1 POLYGON ((116.3686 34.64087, 116.4237 34.65374, … Anhui
## 11 山东省 86 MULTIPOLYGON (((120.7183 36.28086, 120.7147 36.2… Shandong
## 12 山西省 1 POLYGON ((114.1232 40.74159, 114.2001 40.65104, … Shanxi
## 13 广东省 154 MULTIPOLYGON (((109.9309 20.22808, 109.9309 20.2… Guangdong
## 14 广西壮族自治区… 6 MULTIPOLYGON (((109.2133 20.89708, 109.2109 20.8… Guangxi
## 15 新疆维吾尔自治区… 1 POLYGON ((96.38329 42.72696, 96.35991 42.70969, … Xinjiang
## 16 江苏省 5 MULTIPOLYGON (((119.4393 34.77124, 119.4419 34.7… Jiangsu
## 17 江西省 1 POLYGON ((116.1283 29.82727, 116.1726 29.83686, … Jiangxi
## 18 河北省 9 MULTIPOLYGON (((117.194 40.08278, 117.1983 40.07… Hebei
## 19 河南省 1 POLYGON ((113.7235 36.35595, 113.7516 36.37056, … Henan
## 20 浙江省 179 MULTIPOLYGON (((120.4666 27.16607, 120.468 27.16… Zhejiang
## 21 海南省 79 MULTIPOLYGON (((113.2364 6.323439, 113.2353 6.32… Hainan
## 22 湖北省 1 POLYGON ((109.757 32.91204, 109.7823 33.00224, 1… Hubei
## 23 湖南省 1 POLYGON ((111.2695 30.01506, 111.2882 30.01272, … Hunan
## 24 澳门特别行政区… 1 POLYGON ((114.2259 22.54496, 114.226 22.54444, 1… Macau
## 25 甘肃省 1 POLYGON ((96.38329 42.72696, 96.38308 42.72533, … Gansu
## 26 福建省 168 MULTIPOLYGON (((119.9835 25.96752, 119.986 25.96… Fujian
## 27 西藏自治区 1 POLYGON ((79.03262 34.33298, 79.07736 34.41547, … Tibet
## 28 贵州省 2 MULTIPOLYGON (((109.5375 26.75003, 109.571 26.76… Guizhou
## 29 辽宁省 94 MULTIPOLYGON (((121.7979 40.99618, 121.7931 40.9… Liaoning
## 30 重庆市 1 POLYGON ((108.5145 32.21204, 108.5308 32.195, 10… Chongqing
## 31 陕西省 1 POLYGON ((111.1508 39.54331, 111.1104 39.50884, … Shanxi
## 32 青海省 1 POLYGON ((96.98931 38.82063, 96.98743 38.8617, 9… Qinghai
## 33 香港特别行政区… 53 MULTIPOLYGON (((113.9175 22.15853, 113.9029 22.1… Hong Kong
## 34 黑龙江省 1 POLYGON ((121.4884 53.33265, 121.4995 53.33601, … Heilongji…
Try to join the virus data ncov_tbl
and map data chn_prov
.
# for exploration
plotdate <- "2020-02-14"
case <- "confirmed"
ncov_tbl %>%
filter(`Country/Region` %in% c("Mainland China", "Macau", "Hong Kong", "Taiwan")) %>%
filter(Date == plotdate, Case == case) %>%
group_by(`Province/State`) %>%
top_n(1, Date) %>%
right_join(chn_prov, by = c("Province/State" = "NAME_ENG")) # join map and virus data
## Warning: Column `Province/State`/`NAME_ENG` has different attributes on LHS and
## RHS of join
## # A tibble: 34 x 10
## # Groups: Province/State [33]
## `Province/State` `Country/Region` Lat Long Date Case Count NAME
## <chr> <chr> <dbl> <dbl> <date> <chr> <dbl> <chr>
## 1 Shanghai Mainland China 31.2 121. 2020-02-14 conf… 318 上海市…
## 2 Yunnan Mainland China 25.0 101. 2020-02-14 conf… 162 云南省…
## 3 Inner Mongolia Mainland China 44.1 114. 2020-02-14 conf… 65 内蒙古自…
## 4 Beijing Mainland China 40.2 116. 2020-02-14 conf… 372 北京市…
## 5 Taiwan Taiwan 23.7 121. 2020-02-14 conf… 18 台湾省…
## 6 Jilin Mainland China 43.7 126. 2020-02-14 conf… 86 吉林省…
## 7 Sichuan Mainland China 30.6 103. 2020-02-14 conf… 463 四川省…
## 8 Tianjin Mainland China 39.3 117. 2020-02-14 conf… 120 天津市…
## 9 Ningxia Mainland China 37.3 106. 2020-02-14 conf… 67 宁夏回族…
## 10 Anhui Mainland China 31.8 117. 2020-02-14 conf… 934 安徽省…
## # … with 24 more rows, and 2 more variables: n <int>, geometry <GEOMETRY>
Plot confirmed cases on a specific date:
library(wesanderson)
plotdate <- "2020-02-14"
case <- "confirmed"
ncov_tbl %>%
filter(`Country/Region` %in% c("Mainland China", "Macau", "Hong Kong", "Taiwan")) %>%
filter(Date == plotdate, Case == case) %>%
group_by(`Province/State`) %>%
top_n(1, Date) %>% # take the latest count on that date
right_join(chn_prov, by = c("Province/State" = "NAME_ENG")) %>%
ggplot() +
geom_sf(mapping = aes(fill = Count, geometry = geometry)) +
# scale_fill_gradient(low = "white",
# high = "red",
# trans = "log10",
# limits = c(1, 50000),
# breaks = c(1, 10, 100, 1000, 10000),
# name = "") +
scale_fill_gradientn(colors = wes_palette("Zissou1", 100, type = "continuous"),
trans = "log10") + # can we find a better palette?
# #scale_fill_brewer(palette = "Dark2") +
theme_bw() +
labs(title = str_c(case, " cases"), subtitle = plotdate)
## Warning: Column `Province/State`/`NAME_ENG` has different attributes on LHS and
## RHS of join
To plot the line graph of confirmed cases over time. You can do an animization of this over time.
ncov_tbl %>%
filter(`Country/Region` %in% c("Mainland China", "Macau", "Hong Kong", "Taiwan")) %>%
group_by(Date, Case) %>%
summarise(total_count = sum(Count)) %>%
# print()
ggplot() +
geom_line(mapping = aes(x = Date, y = total_count, color = Case), size = 2) +
scale_color_manual(values = c("blue", "black", "green")) +
scale_y_log10() +
labs(y = "Count") +
theme_bw()
Counts by province on a specific date. Do an animation on date.
date <- "2020-02-18"
ncov_tbl %>%
filter(`Country/Region` %in% c("Mainland China", "Macau", "Hong Kong", "Taiwan"),
`Date` == date) %>%
group_by(`Province/State`) %>%
ggplot() +
geom_col(mapping = aes(x = `Province/State`, y = `Count`, fill = `Case`)) +
scale_y_log10() +
labs(title = date) +
theme(axis.text.x = element_text(angle = 90))
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 11 rows containing missing values (geom_col).
Resources about making animations in R:
- gganimate package.
- Section 8.3 of Geomcomputation with R.
ncov_tbl %>%
filter(`Country/Region` %in% c("Mainland China", "Macau", "Hong Kong", "Taiwan")) %>%
filter(Case == case) %>%
right_join(chn_prov, by = c("Province/State" = "NAME_ENG")) %>%
print()
## Warning: Column `Province/State`/`NAME_ENG` has different attributes on LHS and
## RHS of join
## # A tibble: 986 x 10
## `Province/State` `Country/Region` Lat Long Date Case Count NAME
## <chr> <chr> <dbl> <dbl> <date> <chr> <dbl> <chr>
## 1 Shanghai Mainland China 31.2 121. 2020-01-22 conf… 9 上海市…
## 2 Shanghai Mainland China 31.2 121. 2020-01-23 conf… 16 上海市…
## 3 Shanghai Mainland China 31.2 121. 2020-01-24 conf… 20 上海市…
## 4 Shanghai Mainland China 31.2 121. 2020-01-25 conf… 33 上海市…
## 5 Shanghai Mainland China 31.2 121. 2020-01-26 conf… 40 上海市…
## 6 Shanghai Mainland China 31.2 121. 2020-01-27 conf… 53 上海市…
## 7 Shanghai Mainland China 31.2 121. 2020-01-28 conf… 66 上海市…
## 8 Shanghai Mainland China 31.2 121. 2020-01-29 conf… 96 上海市…
## 9 Shanghai Mainland China 31.2 121. 2020-01-30 conf… 112 上海市…
## 10 Shanghai Mainland China 31.2 121. 2020-01-31 conf… 135 上海市…
## # … with 976 more rows, and 2 more variables: n <int>, geometry <MULTIPOLYGON>
Plot the date at all time points (this takes long, a couple minutes):
library(gganimate)
library(transformr)
case = "confirmed"
(p <- ncov_tbl %>%
filter(`Country/Region` %in% c("Mainland China", "Macau", "Hong Kong", "Taiwan")) %>%
filter(Case == case) %>%
right_join(chn_prov, by = c("Province/State" = "NAME_ENG")) %>%
ggplot() +
geom_sf(mapping = aes(fill = Count, geometry = geometry)) +
# scale_fill_gradient(low = "white",
# high = "red",
# trans = "log10",
# limits = c(1, 100000),
# breaks = c(1, 10, 100, 1000, 10000),
# name = "") +
scale_fill_gradientn(colours = wes_palette("Zissou1", 100, type = "continuous"),
trans = "log10") +
theme_bw() +
labs(title = str_c(case, " cases")))
Make animation and save as gif (this takes long, a couple minutes)
(anim <- p +
transition_time(Date) +
labs(title = "Confirmed Cases", subtitle = "Date: {frame_time}"))
animate(anim, renderer = gifski_renderer())
anim_save("confirmed_anim.gif")
library(quantmod)
## Loading required package: xts
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Attaching package: 'xts'
## The following objects are masked from 'package:dplyr':
##
## first, last
## Loading required package: TTR
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Version 0.4-0 included new data defaults. See ?getSymbols.
stock <- getSymbols("^HSI", # S&P 500 (^GSPC), Dow Jones (^DJI), NASDAQ (^IXIC), Russell 2000 (^RUT), FTSE 100 (^FTSE), Nikkei 225 (^N225), HANG SENG INDEX (^HSI)
src = "yahoo",
auto.assign = FALSE,
from = min(ncov_tbl$Date),
to = max(ncov_tbl$Date)) %>%
as_tibble(rownames = "Date") %>%
mutate(Date = date(Date)) %>%
ggplot() +
geom_line(mapping = aes(x = Date, y = HSI.Adjusted)) +
theme_bw()
## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
##
## This message is shown once per session and may be disabled by setting
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.
## Warning: 'indexClass<-' is deprecated.
## Use 'tclass<-' instead.
## See help("Deprecated") and help("xts-deprecated").
## Warning: Calling `as_tibble()` on a vector is discouraged, because the behavior is likely to change in the future. Use `tibble::enframe(name = NULL)` instead.
## This warning is displayed once per session.
stock
# chartSeries(stock, theme = chartTheme("white"),
# type = "line", log.scale = FALSE, TA = NULL)
TODO