Don’t forget to watch the videos for Thursday’s lecture
Project next steps I
Proposal feedback is available.
Required: Address all issues and close them with keywords.
Optional: Submit for regrade, by replying to the “Proposal feedback” issue and tagging me in your reply. Due 5pm tomorrow (Wednesday). Final proposal score will be average of original and updated score.
Within team peer evaluations due today at 5pm, so that we can share them with you before lab tomorrow. Please take 5 minutes to complete the survey (from TEAMMATES).
Project next steps II
If you haven’t yet done so, schedule meetings with your project mentor TA:
Team
TA
the_tibbles
Lorenzo
stats_fm
Lorenzo
phans_of_statistics
Lorenzo
skaz
Lorenzo
messi
Lorenzo
team_six
Lorenzo
cia
Jackie
blue_team
Jackie
Team
TA
pipe_it_up
Jackie
co_medians
Jackie
viz_villians
Jackie
o_ggs
Jackie
visualization_warriors
Sam
ggplot_lessthan_3
Sam
marvel_cinematic_tidyverse
Sam
rgodz
Sam
Project next steps III
Create a slide deck in presentation.qmd, replacing the placeholder text.
Add your writeup to index.qmd, replacing the placeholder text.
In both outputs, hide the code with echo: false prior to turning them in.
Due: Wednesday, February 22 at the beginning of your lab.
Setup
# load packageslibrary(countdown)library(tidyverse)library(lubridate)library(janitor)library(colorspace)library(broom)library(fs)# set theme for ggplot2ggplot2::theme_set(ggplot2::theme_minimal(base_size =14))# set width of code outputoptions(width =65)# set figure parameters for knitrknitr::opts_chunk$set(fig.width =7, # 7" widthfig.asp =0.618, # the golden ratiofig.retina =3, # dpi multiplier for displaying HTML output on retinafig.align ="center", # center align figuresdpi =300# higher dpi, sharper image)
Working with dates
AQI levels
The previous graphic in tibble form, to be used later…
The lubridate package is useful for converting to dates from character strings in a given format, e.g. mdy(), ymd(), etc.
The colorspace package is useful for programmatically darkening / lightening colors
scale_x_date: Set date_labels as "%b %y" for month-2 digit year, "%D" for date format such as %m/%d/%y, etc. See help for strptime() for more.
scale_color_identity() or scale_fill_identity() can be useful when your data already represents aesthetic values that ggplot2 can handle directly. By default doesn’t produce a legend.
Calculating cumulatives
Cumulatives over time
When visualizing time series data, a somewhat common task is to calculate cumulatives over time and plot them
In our example we’ll calculate the number of days with “good” AQI (\(\le\) 50) and plot that value on the y-axis and the date on the x-axis
dch |>ggplot(aes(x = date, y = cumsum_good_aqi, group =1)) +geom_smooth(method ="lm", color ="pink") +geom_line() +scale_x_date(expand =expansion(mult =0.07),date_labels ="%Y" ) +labs(x =NULL, y ="Number of days",title ="Cumulative number of good AQI days (AQI < 50)",subtitle ="Durham-Chapel Hill, NC",caption ="\nSource: EPA Daily Air Quality Tracker" ) +theme(plot.title.position ="plot")
`geom_smooth()` using formula = 'y ~ x'
Detrend
Step 1. Fit a simple linear regression
m <-lm(cumsum_good_aqi ~ date, data = dch)m
Call:
lm(formula = cumsum_good_aqi ~ date, data = dch)
Coefficients:
(Intercept) date
-1.341e+04 7.954e-01
Detrend
Step 2. Augment the data with model results (using broom::augment())
dch_aug |>ggplot(aes(x = date, y = ratio, group =1)) +geom_hline(yintercept =1, color ="gray") +geom_line() +scale_x_date(expand =expansion(mult =0.07),date_labels ="%Y" ) +labs(x =NULL, y ="Number of days\n(detrended)",title ="Cumulative number of good AQI days (AQI < 50)",subtitle ="Durham-Chapel Hill, NC",caption ="\nSource: EPA Daily Air Quality Tracker" ) +theme(plot.title.position ="plot")
Air Quality in Durham
barely anything interesting happening!
let’s look at data from somewhere with a bit more “interesting” air quality data…
sf |>ggplot(aes(x = date, y = cumsum_good_aqi, group =1)) +geom_smooth(method ="lm", color ="pink") +geom_line() +scale_x_date(expand =expansion(mult =0.07),date_labels ="%Y" ) +labs(x =NULL, y ="Number of days",title ="Cumulative number of good AQI days (AQI < 50)",subtitle ="San Francisco-Oakland-Hayward, CA",caption ="\nSource: EPA Daily Air Quality Tracker" ) +theme(plot.title.position ="plot")
`geom_smooth()` using formula = 'y ~ x'
Detrend
Fit a simple linear regression
m_sf <-lm(cumsum_good_aqi ~ date, data = sf)
Augment the data with model results
sf_aug <-augment(m_sf)
Divide the observed value of cumsum_good_aqi by the respective value in the long-term trend (i.e., .fitted)
sf_aug |>ggplot(aes(x = date, y = ratio, group =1)) +geom_hline(yintercept =1, color ="gray") +geom_line() +scale_x_date(expand =expansion(mult =0.07),date_labels ="%Y" ) +labs(x =NULL, y ="Number of days\n(detrended)",title ="Cumulative number of good AQI days (AQI < 50)",subtitle ="San Francisco-Oakland-Hayward, CA",caption ="\nSource: EPA Daily Air Quality Tracker" ) +theme(plot.title.position ="plot")
Detrending
In step 2 we fit a very simple model
Depending on the complexity you’re trying to capture you might choose to fit a much more complex model
You can also decompose the trend into multiple trends, e.g. monthly, long-term, seasonal, etc.