31 Vis: Lines

Purpose: Line plots are a key tool for EDA. In contrast with a scatterplot, a line plot assumes the data have a function relation. This can create an issue if we try to plot data that do not satisfy our assumptions. In this exercise, we’ll practice some best-practices for constructing line plots.

Reading: Line plots Topics: Welcome, Line graphs, Similar geoms (skip Maps) Reading Time: ~30 minutes

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✔ ggplot2 3.4.0      ✔ purrr   1.0.1 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.5.0 
## ✔ readr   2.1.3      ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(gapminder)

31.0.1 q1 The following graph doesn’t work as its author intended. Based on what we learned in the reading, fix the following code.

gapminder %>%
  filter(continent == "Asia") %>%
  ggplot(aes(year, lifeExp, color = country)) +
  geom_line()

31.0.2 q2 A line plot makes a certain assumption about the underlying data. What assumption is this? How does that assumption relate to the following graph? Put differently, why is the use of geom_line a bad idea for the following dataset?

## TODO: No need to edit; just answer the questions
mpg %>%
  ggplot(aes(displ, hwy)) +
  geom_line()

Observations: - A line plot assumes the underlying data have a function relationship; that is, that there is one y value for every x value - The mpg dataset does not have a function relation between displ and hwy; there are cars with identical values of displ but different values of hwy