20 Vis: Histograms
Purpose: Histograms are a key tool for EDA. In this exercise we’ll get a little more practice constructing and interpreting histograms and densities.
Reading: Histograms Topics: (All topics) Reading Time: ~20 minutes
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✔ ggplot2 3.4.0 ✔ purrr 1.0.1
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.5.0
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
20.0.1 q1 Using the graphs generated in the chunks q1-vis1
and q1-vis2
below, answer:
- Which
class
has the most vehicles? - Which
class
has the broadest distribution ofcty
values? - Which graph—
vis1
orvis2
—best helps you answer each question?
- From this graph, it’s easy to see that
suv
is the most numerous class
- From this graph, it’s easy to see that
subcompact
has the broadest distribution
In my opinion, it’s easier to see the broadness of subcompact
by the density plot q1-vis2
.
In the previous exercise, we learned how to facet a graph. Let’s use that part of the grammar of graphics to clean up the graph above.
20.0.2 q2 Modify q1-vis2
to use a facet_wrap()
on the class
. “Free” the vertical axis with the scales
keyword to allow for a different y scale in each facet.
In the reading, we learned that the “most important thing” to keep in mind with geom_histogram()
and geom_freqpoly()
is to explore different binwidths. We’ll explore this idea in the next question.
20.0.3 q3 Analyze the following graph; make sure to test different binwidths. What patterns do you see? Which patterns remain as you change the binwidth?
## TODO: Run this chunk; play with differnet bin widths
diamonds %>%
filter(carat < 1.1) %>%
ggplot(aes(carat)) +
geom_histogram(binwidth = 0.01, boundary = 0.005) +
scale_x_continuous(
breaks = seq(0, 1, by = 0.1)
)
Observations
- The largest number of diamonds tend to fall on or above even 10-ths of a carat.
- The peak near 0.5
is very broad, compared to the others.
- The peak at 0.3
is most numerous