Loading the tidyverse loads the following packages:
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.1 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.1 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
So you never need to load dplyr, ggplot2, readr, etc. separately if you’ve already loaded the tidyverse package.
Setup
# load packageslibrary(countdown)# library(tidyverse) - already loadedlibrary(mapproj)library(sf)library(geofacet)# set theme for ggplot2ggplot2::theme_set(ggplot2::theme_minimal(base_size =14))# set width of code outputoptions(width =65)# set figure parameters for knitrknitr::opts_chunk$set(fig.width =7, # 7" widthfig.asp =0.618, # the golden ratiofig.retina =3, # dpi multiplier for displaying HTML output on retinafig.align ="center", # center align figuresdpi =300# higher dpi, sharper image)
Projections
Visualizing geographic areas
Without any projection, on the cartesian coordinate system
world_map <-ggplot(map_data("world"), aes(x = long, y = lat, group = group)) +geom_polygon(fill ="white", color ="#3c3b6e", linewidth =0.3) +labs(x =NULL, y =NULL)world_map
Mercator projection
Meridians are equally spaced and vertical, parallels are horizontal lines whose spacing increases the further we move away from the equator
world_map +geom_point(data = cities, aes(x = long, y = lat, group =NULL),size =2, color ="red" ) +geom_line(data = cities, aes(x = long, y = lat, group =NULL),linewidth =1, color ="red" ) +geom_line(data = gc, aes(x = lon, y = lat, group =NULL),linewidth =1, color ="red", linetype ="dashed" ) +coord_map(projection ="mercator", xlim =c(-180, 180))
Plotting both distances
Another distance between two points
How long does it take to fly from the Western most point in the US to the Eastern most point? Guess.
Dateline
Geospatial data in the real world: Freedom index
Freedom index
Since 1973, Freedom House has assessed the condition of political rights and civil liberties around the world.
It is used on a regular basis by policymakers, journalists, academics, activists, and many others.
Bias warning
“Freedom Index” from any source have potential bias and is prone to miscalculations. While the index appears to cover many social issues including freedom of religion, expression, etc. this data (like any data) should be approached with skepticism. Quantifying complex issues like these is difficult and the process can oversimplify difficult to record/measure political nuances.
Data
freedom <-read_csv("data/freedom-2022.csv", na =c("", "-")) |>drop_na()
freedom
# A tibble: 195 × 4
country pr cl status
<chr> <dbl> <dbl> <chr>
1 Afghanistan 7 7 NF
2 Albania 3 3 PF
3 Algeria 6 5 NF
4 Andorra 1 1 F
5 Angola 6 5 NF
6 Antigua and Barbuda 2 2 F
7 Argentina 2 2 F
8 Armenia 4 4 PF
9 Australia 1 1 F
10 Austria 1 1 F
# ℹ 185 more rows
pr: Political rights rating
cl: Civil liberties rating
status: The average of each pair of ratings on political rights and civil liberties determines the overall status of F (Free, 1.0 - 2.5), PF (Partly Free, 3.0 - 5.0), or NF (Not Free, 5.5 - 7.0)
Improve
The following visualization shows the distribution civil liberties ratings (1 - greatest degree of freedom to 7 - smallest degree of freedom). This is, undoubtedly, not the best visualization we can make of these data. How can we improve it?
Mapping the freedom data
Obtain country boundaries and store as a data frame
Join the freedom and country boundaries data frames
Plot the country boundaries, and fill by freedom scores
map_data()
The map_data() function easily turns data from the maps package in to a data frame suitable for plotting with ggplot2:
ggplot(world_map, aes(x = long, y = lat, group = group)) +geom_polygon(fill ="gray") +coord_quickmap()
Freedom and world map
freedom |>select(country)
# A tibble: 195 × 1
country
<chr>
1 Afghanistan
2 Albania
3 Algeria
4 Andorra
5 Angola
6 Antigua and Barbuda
7 Argentina
8 Armenia
9 Australia
10 Austria
# ℹ 185 more rows
What is missing/misleading about the following map?
ggplot(freedom_map, mapping =aes(x = long, y = lat, group = group)) +geom_polygon(aes(fill = cl)) +coord_map(projection ="mercator", xlim =c(-180, 180))
Missing countries
freedom |>anti_join(world_map, by =c("country"="region")) |>select(country) |>print(n =14)
# A tibble: 14 × 1
country
<chr>
1 Antigua and Barbuda
2 Cabo Verde
3 Congo (Brazzaville)
4 Congo (Kinshasa)
5 Cote d'Ivoire
6 Eswatini
7 St. Kitts and Nevis
8 St. Lucia
9 St. Vincent and the Grenadines
10 The Gambia
11 Trinidad and Tobago
12 Tuvalu
13 United Kingdom
14 United States
Data cleanup - freedom
freedom_updated <- freedom |>mutate(country =case_when( country =="Cabo Verde"~"Cape Verde", country =="Congo (Brazzaville)"~"Republic of Congo", country =="Congo (Kinshasa)"~"Democratic Republic of the Congo", country =="Cote d'Ivoire"~"Ivory Coast", country =="St. Lucia"~"Saint Lucia", country =="The Gambia"~"Gambia", country =="United Kingdom"~"UK", country =="United States"~"USA",TRUE~ country ) )
Data cleanup - world_map
world_map_updated <- world_map |>mutate(region =case_when( region =="Antigua"~"Antigua and Barbuda", region =="Barbuda"~"Antigua and Barbuda", region =="Saint Kitts"~"St. Kitts and Nevis", region =="Nevis"~"St. Kitts and Nevis", region =="Saint Vincent"~"St. Vincent and the Grenadines", region =="Grenadines"~"St. Vincent and the Grenadines", region =="Trinidad"~"Trinidad and Tobago", region =="Tobago"~"Trinidad and Tobago", region =="Swaziland"~"Eswatini",TRUE~ region ) )
Check again
freedom_updated |>anti_join(world_map_updated, by =c("country"="region")) |>select(country)
# A tibble: 1 × 1
country
<chr>
1 Tuvalu
Tuvalu, formerly known as the Ellice Islands, is an island country and microstate in the Polynesian subregion of Oceania in the Pacific Ocean. Its islands are situated about midway between Hawaii and Australia. Tuvalu is composed of three reef islands and six atolls.
Let’s map!
Highlights from livecoding
When working through non-matching unique identifiers in a join, you might need to clean the data in both data frames being merged, depending on the context
Two ways to surface polygons with NAs:
left_join() map to data, layering with map at the bottom, data on top
left_join() data to map, set na.value in scale_fill_*() to desired color
Use na.value = "red" (or some other color that will stand out) to easily spot polygons with NAs
us_state_vaccinations <- us_state_vaccinations |>mutate(location =if_else(location =="New York State", "New York", location)) |>filter(location %in%c(state.name, "District of Columbia"))
Geofacet by state
Using geofacet::facet_geo():
ggplot(us_state_vaccinations, aes(x = date, y = people_fully_vaccinated_per_hundred)) +geom_area() +facet_geo(~ location) +labs(x =NULL, y =NULL,title ="Covid-19 vaccination rate in the US",subtitle ="Daily number of people fully vaccinated, per hundred",caption ="Source: Our World in Data" )
Geofacet by state
Geofacet by state, with improvements
ggplot(us_state_vaccinations, aes(x = date, y = people_fully_vaccinated_per_hundred, group = location)) +geom_area() +facet_geo(~location) +scale_y_continuous(limits =c(0, 100),breaks =c(0, 50, 100),minor_breaks =c(25, 75) ) +scale_x_date(breaks =c(ymd("2021-01-01", "2021-05-01", "2021-09-01")), date_labels ="%b") +labs(x =NULL, y =NULL,title ="Covid-19 vaccination rate in the US",subtitle ="Daily number of people fully vaccinated, per hundred",caption ="Source: Our World in Data" ) +theme(strip.text.x =element_text(size =7),axis.text =element_text(size =8),plot.title.position ="plot" )ggplot(us_state_vaccinations, aes(x = date, y = people_fully_vaccinated_per_hundred, group = location)) +geom_area() +facet_geo(~location) +scale_y_continuous(limits =c(0, 100),breaks =c(0, 50, 100),minor_breaks =c(25, 75) ) +scale_x_date(breaks =c(ymd("2021-01-01", "2021-05-01", "2021-09-01")), date_labels ="%b") +labs(x =NULL, y =NULL,title ="Covid-19 vaccination rate in the US",subtitle ="Daily number of people fully vaccinated, per hundred",caption ="Source: Our World in Data" ) +theme(strip.text.x =element_text(size =7),axis.text =element_text(size =8),plot.title.position ="plot" )ggplot(us_state_vaccinations, aes(x = date, y = people_fully_vaccinated_per_hundred, group = location)) +geom_area() +facet_geo(~location) +scale_y_continuous(limits =c(0, 100),breaks =c(0, 50, 100),minor_breaks =c(25, 75) ) +scale_x_date(breaks =c(ymd("2021-01-01", "2021-05-01", "2021-09-01")), date_labels ="%b") +labs(x =NULL, y =NULL,title ="Covid-19 vaccination rate in the US",subtitle ="Daily number of people fully vaccinated, per hundred",caption ="Source: Our World in Data" ) +theme(strip.text.x =element_text(size =7),axis.text =element_text(size =8),plot.title.position ="plot" )