Demo Notebook 2022-07-11#

Demos from the live sessions on 2022-07-11.

Setup#

import grama as gr
import pandas as pd
import numpy as np
DF = gr.Intention()
%matplotlib inline
# NOTE: No need to edit; load the archival data
filename_archival = "../challenges/data/doe-wide.csv"
df_archival = pd.read_csv(filename_archival)
df_archival.head()
d n f_com H W M_max mass dMdtheta_avs dMdtheta_0 M_min BM angle int_M_stable GM
0 0.414694 0.969522 0.384112 1.931260 1.390651 5.004726e-16 0.019592 NaN -1.459991 -4.952630 0.108234 NaN -8.019885 0.193949
1 0.242507 1.195436 0.656740 2.713299 3.580954 7.277457e+00 0.046730 NaN 9.916728 -5.402567 0.425216 NaN 3.675911 -0.557773
2 0.441223 0.936345 0.672138 1.446188 3.744617 6.550768e+00 0.040910 8.237037 -11.668754 -5.879390 1.073773 1.617128 -5.681558 0.768485
3 0.689483 0.882652 0.648739 1.210221 2.530339 1.149756e+00 0.034526 2.380926 -8.409095 -2.228434 0.764303 1.509121 -2.070254 0.652657
4 0.497305 0.780804 0.235978 2.502149 1.505777 7.886234e-16 0.029039 NaN -8.644337 -14.019720 0.103794 NaN -25.533024 0.796536

Correlation tile plot#

Since there are NaN values in our dataset df_archival, we need to handle them somehow. We can use the nan_drop=True option in gr.tf_iocorr() to simply drop the NaN’s when computing correlations.

var = ["d", "n", "f_com", "H", "W"]
out = [
    "M_max", 
    "M_min", 
    "mass", 
    "dMdtheta_avs", 
    "dMdtheta_0", 
    "BM", 
    "GM",
    "angle",
    "int_M_stable",
]

(
    df_archival
    >> gr.tf_iocorr(
        var=var, 
        out=out, 
        nan_drop=True,
    )
    >> gr.pt_auto()
)
Calling plot_corrtile....
../_images/demo2022-07-11_6_1.png
<ggplot: (8760894671336)>

From this plot, we can see that the correlation between H and dMdtheta_0 is small, as is the correlation between n and dMdtheta_0. This suggests that the linear relationship between the two is quite small. But keep in mind that a nonlinear relationship could also exist.

We can also trim the out argument to produce a more focused plot:

(
    df_archival
    >> gr.tf_iocorr(
        var=var, 
        out=["dMdtheta_0"],
        nan_drop=True,
    )
    >> gr.pt_auto()
)
Calling plot_corrtile....
../_images/demo2022-07-11_8_1.png
<ggplot: (8760914488950)>

A plot like this looks somewhat strange. Really, we only have five numbers, so we might as well just look at the table of values:

(
    df_archival
    >> gr.tf_iocorr(
        var=var, 
        out=["dMdtheta_0"],
        nan_drop=True,
    )
    >> gr.tf_arrange(DF.rho)
)
rho var out
0 -0.424045 W dMdtheta_0
1 -0.336092 d dMdtheta_0
2 0.074710 n dMdtheta_0
3 0.089236 H dMdtheta_0
4 0.611931 f_com dMdtheta_0

Detailed Scatterplot#

Here’s how I incorporate multiple scatterplots into a single figure. Pivoting the data allows me to use faceting.

(
    df_archival
    >> gr.tf_select(
        "dMdtheta_0",
        "W",
        "d",
        "f_com"
    )
    >> gr.tf_pivot_longer(
        columns=["W", "d", "f_com"],
        names_to="var",
        values_to="value",
    )
    
    >> gr.ggplot(gr.aes("value", "dMdtheta_0"))
    + gr.geom_hline(yintercept=0, color="red")
    + gr.geom_point()
    + gr.geom_smooth(color="blue")
    + gr.facet_grid("~var", scales="free_x")
    + gr.theme_minimal()
)
../_images/demo2022-07-11_12_0.png
<ggplot: (8760835732209)>

Plotting quantiles#

factor = 1/2
(
    df_archival
    
    >> gr.ggplot(gr.aes("W", "dMdtheta_0"))
    + gr.geom_ribbon(
        data=df_archival
        >> gr.tf_mutate(W_bin=gr.round(DF.W*factor, decimals=1)/factor)
        >> gr.tf_group_by(DF.W_bin)
        >> gr.tf_summarize(
            ymin=gr.quant(DF.dMdtheta_0, 0.25),
            dMdtheta_0=gr.mean(DF.dMdtheta_0),
            ymax=gr.quant(DF.dMdtheta_0, 0.75),
        ),
        mapping=gr.aes(x="W_bin", ymin="ymin", ymax="ymax"),
        alpha=2/4,
    )
    + gr.geom_point(size=0.2)
    + gr.theme_minimal()
)
../_images/demo2022-07-11_14_0.png
<ggplot: (8760881143866)>

Targeted Sinew Plot#

Note that gr.pt_auto() calls other plotting functions with default arguments. We can call the same function with different arguments to tweak a visual. For instance, we can plot a smaller number of outputs for a more focused plot:

## Targeting a single output
(
    df_sinews
    >> gr.pt_sinew_outputs(
        var=["W", "H", "d", "f_com", "n"],
        out=["dMdtheta_0"],
    )
)

## Targeting a couple inputs
(
    df_sinews
    >> gr.pt_sinew_outputs(
        var=["d", "f_com"],
        out=["dMdtheta_0"],
    )
)

## Manually constructing a single sweep (over W)
(
    df_sinews
    >> gr.tf_filter(DF.sweep_var == "W")
    
    >> gr.ggplot(gr.aes("W", "dMdtheta_0"))
    + gr.geom_line(gr.aes(color="factor(sweep_ind)"))
)