Lecture 0
Duke University
STA 313 - Spring 2023
Form a small group (2-4 people) with people sitting around you
First, introduce yourselves to each other – name (and proper pronunciation of name), year, major, where are you from, etc.
Start with the bad graphs – Share your examples of “bad” graphs and why you think they’re bad.
Then, share your good graphs – Same deal, share your examples of “good” graphs and why you think they’re good.
Finally, choose the one plot from your group that you think is most striking, either because it’s bad or because it’s good, and have one team member share the graph on #general in Slack.
aka “the one link to rule them all”
In person
Attendance is required (as long as you’re healthy!)
A little bit of everything:
Recordings will be posted after class – to be used for review + make-up if you can’t make it to class due to health reasons, they’re not an alternative to class attendance
Attendance is required (as long as you’re healthy!)
Opportunity to work on course assignments with TA support
Opportunity to work with teammates on projects
Posted on Sakai (Announcements tool) and sent via email, be sure to check both regularly
I’ll assume that you’ve read an announcement by the next “business” day
It is my intent that students from all diverse backgrounds and perspectives be well-served by this course, that students’ learning needs be addressed both in and out of class, and that the diversity that the students bring to this class be viewed as a resource, strength and benefit.
If you have a name that differs from those that appear in your official Duke records, please let me know!
Please let me know your preferred pronouns.
If you feel like your performance in the class is being impacted by your experiences outside of class, please don’t hesitate to come and talk with me. I want to be a resource for you. If you prefer to speak with someone outside of the course, your advisers and deans are excellent resources.
I (like many people) am still in the process of learning about diverse perspectives and identities. If something was said in class (by anyone) that made you feel uncomfortable, please talk to me about it.
The Student Disability Access Office (SDAO) is available to ensure that students are able to engage with their courses and related assignments.
I am committed to making all course materials accessible and I’m always learning how to do this better. If any course component is not accessible to you in any way, please don’t hesitate to let me know.
Required throughout the semester in lecture and lab
Students who attend at least 80% of the lectures and participate regularly in lecture and/or other course venues (lab + Slack) will receive full credit for this portion of their grade
Participation in labs as well as on Slack will also count towards this component
Same/similar data, different results
Presentation and write-up
Wrapped up before midterms grades are due
The world is your oyster! (and more details TBA)
New team
Presentation and write-up
Wrapped up on the final exam date
This course is assessed 100% on your coursework (there is no exam). We will be assessing you based on the following assignments,
Assignment | Type | Value | n | Due |
---|---|---|---|---|
Attendance + participation | Individual | 5% | ||
Reading quizzes | Individual | 10% | 7 | ~ Every other week |
Homeworks | Individual | 45% | 6 | ~ Every other week |
Project 1 | Team | 15% | 1 | ~ Week 6 + earlier interim deadlines |
Project 2 | Team | 25% | 1 | ~ Finals week + earlier interim deadlines |
This is not required but highly recommended!
Wear a mask if the university requires
Stay home if you’re sick and follow guidance
Read and follow university guidance
Reading quizzes: Late submissions not accepted
Homework assignments:
Project presentations: Late submissions not accepted
Project write-ups:
Peer evaluation:
Only work that is clearly assigned as team work should be completed collaboratively (Projects)
Reading quizzes must be completed individually, you may not discuss answers with teammates, clarification questions should only be asked to myself and the TAs
Homework assignments must be completed individually. You may not directly share answers / code with others, however you are welcome to discuss the problems in general and ask for advice
We are aware that a huge volume of code is available on the web, and many tasks may have solutions posted
Unless explicitly stated otherwise, this course’s policy is that you may make use of any online resources (e.g. RStudio Community, StackOverflow, etc.) but you must explicitly cite where you obtained any code you directly use or use as inspiration in your solution(s).
Any recycled code that is discovered and is not explicitly cited will be treated as plagiarism, regardless of source
To uphold the Duke Community Standard:
most importantly:
ask if you’re not sure if something violates a policy!
Mine:
Wednesdays 8 - 9 pm - Zoom (E)
Except Wed, Feb 1
Any other exceptions will be announced in class / by email
By appointment - Zoom or Old Chem 213
TAs:
Jackie: Tuesdays 4 - 6 pm - Edge Project Room 2
Lorenzo: Mondays 5 - 7 pm - Old Chemistry 203b
Sam: Tuesdays 1:30 - 3:30 pm - Location TBA
Evan: Mondays 10 am - 12 pm - Zoom (Link on Sakai > Zoom meetings)
Holly: Mondays 1-3 pm - Zoom (Link on Sakai > Zoom meetings)
+ lots more resources listed on the syllabus!
I want to make sure that you learn everything you were hoping to learn from this class. If this requires flexibility, please don’t hesitate to ask.
You never owe me personal information about your health (mental or physical) but you’re always welcome to talk to me. If I can’t help, I likely know someone who can.
I want you to learn lots of things from this class, but I primarily want you to stay healthy, balanced, and grounded.
Browser based RStudio instance(s) provided by Duke OIT
Requires internet connection to access
Provides consistency in hardware and software environments
Local R installations are fine but we will not guarantee support
GitHub organization for the course
All of your work and your membership (enrollment) in the organization is private
Each assignment is a private repo on GitHub, I distribute the assignments on GitHub and you submit them there
Feedback on assignments is given as GitHub issues, scores recorded on Sakai Gradebook
Fill out the Getting to know you survey for collection of your account names, later this week you will be invited to the course organization.
in case you don’t yet have a GitHub account…
Some brief advice about selecting your account names (particularly for GitHub),
Incorporate your actual name! People like to know who they’re dealing with and makes your username easier for people to guess or remember
Reuse your username from other contexts, e.g., Twitter or Slack
Pick a username you will be comfortable revealing to your future boss
Shorter is better than longer, but be as unique as possible
Make it timeless. Avoid highlighting your current university, employer, or place of residence
Online forum for asking and answering questions
Private repo in the course organization
You will need to join the course organization for access
Ask and answer questions related to course logistics, assignment, etc. here
Personal questions (e.g., extensions, illnesses, etc.) should be via email to me
Once you join, browse the channels to make sure you’re posting questions in the right channel, update your profile with your name, photo/avatar of you that matches your GitHub profile, and your pronouns
Unfortunately Slack is not the best place to ask coding questions, but it’s a great place for real-time connection and collaboration
Demo on Tuesday for asking good questions with proper code formatting.
Create a GitHub account if you don’t have one
Read the syllabus
Make sure you can login in to RStudio and reserve a container for STA 313: cmgr.oit.duke.edu/containers
Complete the Getting to know you survey on Sakai – to be posted
Complete the readings and Reading Quiz 1 – to be posted
“The simple graph has brought more information to the data analyst’s mind than any other device.” --- John Tukey
Data visualization is the creation and study of the visual representation of data
Many tools for visualizing data -- R is one of them
Many approaches/systems within R for making data visualizations -- ggplot2 is one of them, and that’s what we’re going to use
ggplot2 is tidyverse’s data visualization package
gg
in “ggplot2” stands for Grammar of Graphics
Inspired by the book Grammar of Graphics by Leland Wilkinson
A grammar of graphics is a tool that enables us to concisely describe the components of a graphic
ggplot()
is the main function in ggplot2Measurements for penguin species, island in Palmer Archipelago, size (flipper length, body mass, bill dimensions), and sex.
Rows: 344
Columns: 8
$ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
$ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
$ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
$ sex <fct> male, female, female, NA, female, male, female, male…
$ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
ggplot(
penguins,
aes(x = bill_depth_mm,y = bill_length_mm,colour = species)) +
geom_point() +
labs(
title = "Bill depth and length",
subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins",
x = "Bill depth (mm)", y = "Bill length (mm)",
colour = "Species",
caption = "Source: Palmer Station LTER / palmerpenguins package"
) +
theme_minimal() +
ggthemes::scale_color_colorblind()
Start with the penguins
data frame, map bill depth to the x-axis and map bill length to the y-axis.
Represent each observation with a point and map species to the colour of each point.
Title the plot “Bill depth and length”, add the subtitle “Dimensions for Adelie, Chinstrap, and Gentoo Penguins”, label the x and y axes as “Bill depth (mm)” and “Bill length (mm)”, respectively, label the legend “Species”, and add a caption for the data source.
Finally, use a discrete colour scale that is designed to be perceived by viewers with common forms of colour blindness.