Vis: Small Multiples

Vis: Small Multiples#

Purpose: A powerful idea in visualization is the small multiple. In this exercise you’ll learn how to design and create small multiple graphs.

“At the heart of quantitative reasoning is a single question: Compared to what?”

Edward Tufte on visual comparison.

Setup#

import grama as gr
DF = gr.Intention()
%matplotlib inline

Fundamentals of small multiples#

Facets in ggplot allow us to apply the ideas of small multiples. As an example, consider the following graph; this example introduces the new ggplot utility facet_wrap(). This visual depicts economic data across several decades.

## NOTE: No need to edit
from plotnine.data import economics as df_economics

(
    df_economics
    >> gr.tf_pivot_longer(
        columns=["pce", "pop", "psavert", "uempmed", "unemploy"],
        names_to="variable",
        values_to="value",
    )
    >> gr.ggplot(gr.aes("date", "value"))
    + gr.geom_line()
    
    ## Faceting allows us to implement small multiples
    + gr.facet_wrap("variable", scales="free_y")
    
    + gr.theme(axis_text_x=gr.element_text(angle=270))

)

/Users/zach/opt/anaconda3/envs/evc/lib/python3.9/site-packages/plotnine/utils.py:371: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
/Users/zach/opt/anaconda3/envs/evc/lib/python3.9/site-packages/plotnine/facets/facet.py:390: PlotnineWarning: If you need more space for the x-axis tick text use ... + theme(subplots_adjust={'wspace': 0.25}). Choose an appropriate value for 'wspace'.

../_images/baa5649da9341f20aa57ca36d9c61aaf51beeb2a79f25ace5871d8b17d86772d.png

<ggplot: (8759511485308)>

The “multiples” are the different panels; above we’ve separated the different variables into their own panel, and plotted each one against the date. This allows us to compare trends simply by looking across at different panels. For instance, we can see that pce and pop exhibit smooth growth over time, while the other variables seem to exhibit cyclical trends.

The faceting above works particularly well for comparing trends: It’s clear by inspection whether the various trends are increasing or decreasing, and we can easily see how each trend compares with others by looking at a different panel.

The facet_wrap(var) utility takes the name of a column var to use as a grouping variable; each unique value in the given column will be used to construct a small multiple. You’ll practice using this utility in the next task.

## NOTE: Run this cell
from grama.data import df_stang

q2 To free the scales? Or not?#

Run the following code as-is and inspect the results. Answer the questions under observations below. Re-run the code following the instructions below.

## TASK: Run this code, then try disabling the `scales` argument

(
    df_mpg
    >> gr.ggplot(gr.aes("displ", "hwy"))
    + gr.geom_point()
    + gr.facet_wrap(
        "~carclass", 

    )
)

/Users/zach/opt/anaconda3/envs/evc/lib/python3.9/site-packages/plotnine/utils.py:371: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
/Users/zach/opt/anaconda3/envs/evc/lib/python3.9/site-packages/plotnine/facets/facet.py:390: PlotnineWarning: If you need more space for the x-axis tick text use ... + theme(subplots_adjust={'wspace': 0.25}). Choose an appropriate value for 'wspace'.
/Users/zach/opt/anaconda3/envs/evc/lib/python3.9/site-packages/plotnine/facets/facet.py:396: PlotnineWarning: If you need more space for the y-axis tick text use ... + theme(subplots_adjust={'hspace': 0.25}). Choose an appropriate value for 'hspace'

../_images/7eb7560d9c358f69fba13421b82a46bed0e14ae921c20d1c83c2012ce2daa47c.png

<ggplot: (8759460220185)>

Observations

Based on the plot above, how much visual variation is there among 2seater vehicles, with respect to their hwy fuel economy and engine displacement (displ)?
- In this version of the plot (scales="free") the hwy and displ values fill up the entire panel, which gives the impression of large variability among the values.
Now comment out the scales="free" argument, re-run the code, and inspect the new plot.
With the new plot, how much visual variation is there among 2seater vehicles, with respect to their hwy fuel economy and engine displacement (displ)?
- In this version of the plot (scales="fixed") the hwy and displ values are tightly clustered within their panel, which gives the impression of very small variability among the values.

If the different groups have similar values, or if you are trying to encourage numerical comparisons rather than just trend comparisons, it may be a good idea to keep the scales fixed.

Finer Points#

With the basics of facets under our belt, now we can move on to some finer points about constructing small multiple plots.

“Ghost points”#

This version of the df_mpg plot is not as effective as it could be:

## NOTE: No need to edit
(
    df_mpg
    >> gr.ggplot(gr.aes("displ", "hwy"))
    + gr.geom_point()
    + gr.facet_wrap("~carclass") 
)

/Users/zach/opt/anaconda3/envs/evc/lib/python3.9/site-packages/plotnine/utils.py:371: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.

../_images/303d523d0555f651b1d7321371d517b7eba02448be49e02e50f01380a3bdeed3.png

<ggplot: (8759493742903)>

With these scatterplots it’s difficult to “keep in our heads” the absolute positions of the other points as we look across the multiples. Instead we could add some “ghost points”:

## NOTE: No need to edit
(
    df_mpg
    >> gr.ggplot(gr.aes("displ", "hwy"))
    ## A bit of a trick; remove the facet variable to prevent faceting
    + gr.geom_point(
        data=df_mpg >> gr.tf_drop("carclass"),
        color="grey",
    )
    + gr.geom_point()
    + gr.facet_wrap("carclass")
)

/Users/zach/opt/anaconda3/envs/evc/lib/python3.9/site-packages/plotnine/facets/facet.py:487: FutureWarning: Passing a set as an indexer is deprecated and will raise in a future version. Use a list instead.
/Users/zach/opt/anaconda3/envs/evc/lib/python3.9/site-packages/plotnine/utils.py:371: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
/Users/zach/opt/anaconda3/envs/evc/lib/python3.9/site-packages/plotnine/utils.py:371: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.

../_images/c91a2e090dea0389d06e7b04e14e49058829de3408ae5580357b1fe031ae07c0.png

<ggplot: (8759511839853)>

Here we’re using visual weight to call attention to the black points within each panel, while using lower-weight grey points to de-emphasize the bulk of data. From this version of the plot we can see clearly that the 2seater vehicles are tightly clustered and they tend to have higher hwy for similar displ vehicles.

There’s a trick to getting the visual above; removing the facet variable from an internal dataframe prevents the faceting of that layer. This combined with a second point layer gives the “ghost” point effect.

The presence of these “ghost” points provides more context; they facilitate the “Compared to what?” question that Tufte puts at the center of quantitative reasoning.

from grama.data import df_diamonds

q3 Use the “ghost point” trick#

Edit the following figure to use the “ghost” point trick demonstrated above while faceting along the variable "cut".

## TASK: Add "ghost points" to the following plot in order to show
## every observation within each panel
(
    df_diamonds
    >> gr.ggplot(gr.aes("carat", "price"))
    + gr.geom_point(
        data=df_diamonds >> gr.tf_drop("cut"),
        color="grey",
    )
    + gr.geom_point()
    + gr.facet_wrap("cut")
)

/Users/zach/opt/anaconda3/envs/evc/lib/python3.9/site-packages/plotnine/facets/facet.py:487: FutureWarning: Passing a set as an indexer is deprecated and will raise in a future version. Use a list instead.
/Users/zach/opt/anaconda3/envs/evc/lib/python3.9/site-packages/plotnine/utils.py:371: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
/Users/zach/opt/anaconda3/envs/evc/lib/python3.9/site-packages/plotnine/utils.py:371: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.

../_images/0bf32815c527871f239cddd53581243ad77800899f6fc7add5293f0a9743269d.png

<ggplot: (8759511575929)>

Aside: Reordering factors#

The utility function gr.fct_reorder() allows us to “reorder” factor levels according to another variable. This is useful because it enables us to control the order in which factor levels are displayed on a plot.

q4 Reorder the `"carclass"`#

Use gr.fct_reorder() to reorder the "carclass" variable according to "hwy". Answer the questions under observations below.

Hint: Remember to check the documentation for a new function to learn how to use it!

## TASK: Reorder `carclass` by `hwy`
(
    df_mpg
    >> gr.tf_mutate(carclass=gr.fct_reorder(DF.carclass, DF.hwy))
    
    >> gr.ggplot(gr.aes("carclass", "hwy"))
    + gr.geom_boxplot()
)

../_images/0d7c6e6a181b63c066cce608f4a4d11a7c3f8c4e51edb33010a9b1dd9a953cbe.png

<ggplot: (8759482652085)>

Observations

When you do not reorder carclass, what order are the classes listed along the horizontal axis?
- The carclass values are listed in alphabetical order.
When you do reorder carclass, what is changed about the plot? What do you notice about the boxplots?
- Now the boxplots tend to “rise” across the plot; the carclass values are now ordered by their median hwy value.

Controlling the facet axis#

Sometimes you’ll want to place facets along the horizontal or vertical axis only; this is helpful when seeking to make more direct comparisons across an axis. The utility facet_grid() allows you to specify whether to facet along the vertical or horizontal axis of the plot.

For example, consider the following figure:

## NOTE: No need to edit
(
    df_mpg
    ## Find highest fuel economy models within each manufacturer
    >> gr.tf_group_by(DF.manufacturer)
    >> gr.tf_filter(DF.hwy == gr.max(DF.hwy))
    >> gr.tf_ungroup()
    ## Reorder manufacturers based on their fuel economy
    >> gr.tf_mutate(manufacturer=gr.fct_reorder(DF.manufacturer, DF.hwy))
    
    ## Visualize
    >> gr.ggplot(gr.aes("hwy", "model"))
    + gr.geom_point()
    ## Use facet_grid to control which axis gets the faceting
    + gr.facet_grid("manufacturer~.", scales="free_y")
    + gr.theme(strip_text_y=gr.element_text(angle=0, hjust=0))
)

/Users/zach/opt/anaconda3/envs/evc/lib/python3.9/site-packages/plotnine/utils.py:371: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.

../_images/a172c49c398638769522a6f44b544ad45e6e71a35dd97f2a91a0d3d0edb0b2ac.png

<ggplot: (8759511580245)>

For this visual to work I need all facets to share a common horizontal axis. Note what happens when I simply wrap the facets instead:

## NOTE: No need to edit
(
    df_mpg
    ## Find highest fuel economy models within each manufacturer
    >> gr.tf_group_by(DF.manufacturer)
    >> gr.tf_filter(DF.hwy == gr.max(DF.hwy))
    >> gr.tf_ungroup()
    ## Reorder manufacturers based on their fuel economy
    >> gr.tf_mutate(manufacturer=gr.fct_reorder(DF.manufacturer, DF.hwy))
    
    ## Visualize
    >> gr.ggplot(gr.aes("hwy", "model"))
    + gr.geom_point()
    ## Use facet_grid to control which axis gets the faceting
    + gr.facet_wrap("manufacturer", scales="free_y")
    + gr.theme(strip_text_y=gr.element_text(angle=0, hjust=0))
)

/Users/zach/opt/anaconda3/envs/evc/lib/python3.9/site-packages/plotnine/utils.py:371: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
/Users/zach/opt/anaconda3/envs/evc/lib/python3.9/site-packages/plotnine/facets/facet.py:390: PlotnineWarning: If you need more space for the x-axis tick text use ... + theme(subplots_adjust={'wspace': 0.25}). Choose an appropriate value for 'wspace'.

../_images/8e90029a15e6fa253805e32d0cedf4f77fce0d3fc056bf4f02a995a39b983fcf.png

<ggplot: (8759495461825)>

This figure is essentially useless; without a common horizontal axis is it almost impossible to compare values across panels.

Vis: Small Multiples

Contents

Vis: Small Multiples#

Setup#

Fundamentals of small multiples#

q1 Use `facet_wrap()`#

q2 To free the scales? Or not?#

Finer Points#

“Ghost points”#

q3 Use the “ghost point” trick#

Aside: Reordering factors#

q4 Reorder the `"carclass"`#

Controlling the facet axis#

q5 Facet along a single axis#

Sinew plots and facets#

Vis: Small Multiples

Contents

Vis: Small Multiples#

Setup#

Fundamentals of small multiples#

q1 Use facet_wrap()#

q2 To free the scales? Or not?#

Finer Points#

“Ghost points”#

q3 Use the “ghost point” trick#

Aside: Reordering factors#

q4 Reorder the "carclass"#

Controlling the facet axis#

q5 Facet along a single axis#

Sinew plots and facets#

q1 Use `facet_wrap()`#

q4 Reorder the `"carclass"`#