50 Stats: Introduction to Hypothesis Testing
Purpose: Part of the payoff of statistics is to support making decisions under uncertainty. To frame these decisions we will use the framework of hypothesis testing. In this exercise you’ll learn how to set up competing hypotheses and potential actions, based on different scenarios.
Reading: Statistical Inference in One Sentence (9 min)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✔ ggplot2 3.4.0 ✔ purrr 1.0.1
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.5.0
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
50.1 A Full Example
You are considering buying a set of diamonds in bulk. The prospective vendor is willing to sell you 100 diamonds at $1700 per diamond. You will not get to see the specific diamonds before buying, though. To convince you, the vendor gives you a detailed list of a prior package of bulk diamonds they sold recently—they tell you this is representative of the packages they sell.
This is a weird contract, but it’s intriguing. Let’s use statistics to help determine whether or not to take the deal.
50.2 Pick your population
For the sake of this exercise, let’s assume that df_population
is the entire
set of diamonds the vendor has in stock.
Important Note: No peeking! While I’ve defined df_population
here, you
should not look at its values until the end of the exercise.
While we do have access to the entirety of the population, in most real problems
we’ll only have a sample. The function slice_sample()
allows us to choose a
random sample from a dataframe.
50.3 Set up your hypotheses and actions
Based on the contract above, our decision threshold should be related to the sale price the vendor quotes.
## NOTE: This is for exercise-design purposes: What are the true parameters?
df_population %>%
group_by(cut) %>%
summarize(price = mean(price)) %>%
bind_rows(
df_population %>%
summarize(price = mean(price)) %>%
mutate(cut = "(All)")
)
## # A tibble: 6 × 2
## cut price
## <chr> <dbl>
## 1 Fair 2092.
## 2 Good 1793.
## 3 Very Good 1732.
## 4 Premium 1598.
## 5 Ideal 1546.
## 6 (All) 1633.
In order to do hypothesis testing, we need to define null and alternative hypotheses. These two hypotheses are competing theories for the state of the world
Furthermore, we are aiming to use hypothesis testing to support making a decision. To that end, we’ll also define a default action (if we fail to reject the null), and an alternative action (if we find our evidence sufficiently convincing so as to change our minds).
For this buying scenario, we feel that the contract is pretty weird: We’ll set up our null hypothesis to assume the vendor is trying to rip us off. In order to make this hypothesis testable, we’ll need to make it quantitative.
One way make our hypothesis quantitative is to think about the mean price of
diamonds in the population: If the diamonds are—on average—less expensive
than the price_threshold
, then on average we’ll tend to get a set of diamonds
that are worth less than what we paid. This will be our null hypothesis.
Consequently, our default action will be to buy no diamonds from this vendor. In
standard statistics notation, this is how we denote our null and alternative
hypotheses:
H_0 (Null hypothesis) The mean price of all diamonds in the population is
less than the threshold price_threshold
.
- Default action: Buy no diamonds
H_A (Alternative hypothesis) The mean price of all diamonds in the population is equal to or greater than the threshold price_threshold
.
- Alternative action: Buy diamonds in bulk
50.4 Compute
50.4.1 q1 Based on your results, can you reject the null hypothesis H_0 for the population with a 95-percent confidence interval?
## TASK: Compute a confidence interval on the mean, use to answer the question
df_sample %>%
summarize(
price_mean = mean(price),
price_sd = sd(price),
price_lo = price_mean - 1.96 * price_sd / sqrt(n()),
price_hi = price_mean + 1.96 * price_sd / sqrt(n())
) %>%
select(price_lo, price_hi)
## # A tibble: 1 × 2
## price_lo price_hi
## <dbl> <dbl>
## 1 1418. 1856.
## [1] 1700
Observations:
- Based on the CI above, we cannot reject the null hypothesis H_0.
- Since we do not reject H_0 we take our default action of buying no diamonds from the vendor.
50.6 Proportion Ideal
Let’s imagine a different scenario: We have a lead on a buyer of engagement
rings who is obsessed with well-cut diamonds. If we could buy at least 50
diamonds with cut Premium
or Ideal
(what we’ll call “high-cut”), we could
easily recoup the cost of the bulk purchase.
If the proportion of high-cut diamonds in the vendor’s population is greater than 50 percent, we stand a good chance of making a lot of money.
Unfortunately, I haven’t taught you any techniques for estimating a CI for a
proportion.
However in e-stat09-bootstrap
we learned a general approximation technique:
the bootstrap. Let’s put that to work to estimate a confidence interval for
the proportion of high-cut diamonds in the population.
50.7 Hypotheses and Actions
Let’s redefine our hypotheses to match the new scenario.
H_0 (Null hypothesis) The proportion of high-cut diamonds in the population is less than 50 percent. - Default action: Buy no diamonds
H_A (Alternative hypothesis) The proportion of high-cut diamonds in the population is equal to or greater than 50 percent. - Alternative action: Buy diamonds in bulk
Furthermore, let’s change our decision threshold from 95-percent confidence to a higher 99-percent confidence.
50.7.1 q2 Use the techniques you learned in e-stat09-bootstrap
to estimate a 99-percent confidence interval for the population proportion of high-cut diamonds. Can you reject the null hypothesis? What decision do you take?
Hint 1: Remember that you can use mean(X == "value")
to compute the proportion
of cases in a sample with variable X
equal to "value"
. You’ll need to figure out how to combine the cases of Premium
and Ideal
.
Hint 2 int_pctl()
takes an alpha
keyword argument; this is simply alpha = 1 - confidence
.
## TASK: Estimate a confidence interval for the proportion of high-cut diamonds
## in the population. Look to `e-stat09-bootstrap` for starter code.
set.seed(101)
fit_fun <- function(split) {
analysis(split) %>%
summarize(estimate = mean((cut == "Premium") | (cut == "Ideal"))) %>%
mutate(term = "proportion_high")
}
df_resample_total_price <-
bootstraps(df_sample, times = 1000) %>%
mutate(estimates = map(splits, fit_fun))
int_pctl(df_resample_total_price, estimates, alpha = 0.01)
## # A tibble: 1 × 6
## term .lower .estimate .upper .alpha .method
## <chr> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 proportion_high 0.530 0.640 0.750 0.01 percentile
Observations:
- Based on the CI above, we can reject the null hypothesis H_0.
- Since we reject H_0 we take our alternative action and buy the diamonds!
50.9 The big reveal
To close this exercise, let’s reveal whether our chosen hypotheses matched the underlying population.
50.9.1 q3 Compute the population mean price for the diamonds. Did you reject the
null hypothesis?
## TASK: Compute the population mean of diamond price
df_population %>%
summarize(price = mean(price))
## # A tibble: 1 × 1
## price
## <dbl>
## 1 1633.
## [1] 1700
Observations:
When I did q1, I did not reject the null. Note the weird wording there: did not reject the null, rathern than “accepted the null”. In this hypothesis testing framework we never actually accept the null hypothesis, we can only fail to reject the null. What this means is that we still maintain the possibility that the null is false, and all we can say for sure is that our data are not sufficient to reject the null hypothesis.
In other words, when we fail to reject the null hypothesis “we’ve learned nothing.”
Learning nothing isn’t a bad thing though! It’s an important part of statistics to recognize when we’ve learned nothing.
50.9.2 q4 Compute the proportion of high-cut diamonds in the population. Did you
reject the null hypothesis?
## TASK: Compute the population proportion of high-cut diamonds
df_population %>%
summarize(proportion = mean((cut == "Premium") | (cut == "Ideal")))
## # A tibble: 1 × 1
## proportion
## <dbl>
## 1 0.667
Observations:
When I did q2 I did reject the null hypothesis. It happens that this was the correct choice; the true proportion of high-cut diamonds is greater than 50-percent.
50.10 End notes
Note that the underlying population is identical in the two settings above, but the “correct” decision is different. This helps illustrate that math alone cannot help you frame a reasonable hypothesis. Ultimately, you must understand the situation you are in, and the decisions you are considering.
If you’ve taken a statistics course, you might be wondering why I’m talking about hypothesis testing without introducing p-values. I feel that confidence invervals more obviously communicate the uncertainty in results, in line with Andrew Gelman’s suggestion that we embrace uncertainty. The penalty we pay working with (two-sided) confidence intervals is a reduction in statistical power.