Demo Notebook 2022-07-11

Demo Notebook 2022-07-11#

Demos from the live sessions on 2022-07-11.

Setup#

import grama as gr
import pandas as pd
import numpy as np
DF = gr.Intention()
%matplotlib inline

# NOTE: No need to edit; load the archival data
filename_archival = "../challenges/data/doe-wide.csv"
df_archival = pd.read_csv(filename_archival)
df_archival.head()

	d	n	f_com	H	W	M_max	mass	dMdtheta_avs	dMdtheta_0	M_min	BM	angle	int_M_stable	GM
0	0.414694	0.969522	0.384112	1.931260	1.390651	5.004726e-16	0.019592	NaN	-1.459991	-4.952630	0.108234	NaN	-8.019885	0.193949
1	0.242507	1.195436	0.656740	2.713299	3.580954	7.277457e+00	0.046730	NaN	9.916728	-5.402567	0.425216	NaN	3.675911	-0.557773
2	0.441223	0.936345	0.672138	1.446188	3.744617	6.550768e+00	0.040910	8.237037	-11.668754	-5.879390	1.073773	1.617128	-5.681558	0.768485
3	0.689483	0.882652	0.648739	1.210221	2.530339	1.149756e+00	0.034526	2.380926	-8.409095	-2.228434	0.764303	1.509121	-2.070254	0.652657
4	0.497305	0.780804	0.235978	2.502149	1.505777	7.886234e-16	0.029039	NaN	-8.644337	-14.019720	0.103794	NaN	-25.533024	0.796536

Correlation tile plot#

Since there are NaN values in our dataset df_archival, we need to handle them somehow. We can use the nan_drop=True option in gr.tf_iocorr() to simply drop the NaN’s when computing correlations.

var = ["d", "n", "f_com", "H", "W"]
out = [
    "M_max", 
    "M_min", 
    "mass", 
    "dMdtheta_avs", 
    "dMdtheta_0", 
    "BM", 
    "GM",
    "angle",
    "int_M_stable",
]

(
    df_archival
    >> gr.tf_iocorr(
        var=var, 
        out=out, 
        nan_drop=True,
    )
    >> gr.pt_auto()
)

Calling plot_corrtile....

../_images/7e148ef5fe6379ac8ef945b6e6e5a089dcb8b725f2f811a178625525a2fbced2.png

<ggplot: (8760894671336)>

From this plot, we can see that the correlation between H and dMdtheta_0 is small, as is the correlation between n and dMdtheta_0. This suggests that the linear relationship between the two is quite small. But keep in mind that a nonlinear relationship could also exist.

We can also trim the out argument to produce a more focused plot:

(
    df_archival
    >> gr.tf_iocorr(
        var=var, 
        out=["dMdtheta_0"],
        nan_drop=True,
    )
    >> gr.pt_auto()
)

Calling plot_corrtile....

../_images/7e58b1e7d22d49c875c1a9aa436e97366caab25fbcc5a84d28bb1ee1c1049691.png

<ggplot: (8760914488950)>

A plot like this looks somewhat strange. Really, we only have five numbers, so we might as well just look at the table of values:

(
    df_archival
    >> gr.tf_iocorr(
        var=var, 
        out=["dMdtheta_0"],
        nan_drop=True,
    )
    >> gr.tf_arrange(DF.rho)
)

	rho	var	out
0	-0.424045	W	dMdtheta_0
1	-0.336092	d	dMdtheta_0
2	0.074710	n	dMdtheta_0
3	0.089236	H	dMdtheta_0
4	0.611931	f_com	dMdtheta_0

Detailed Scatterplot#

Here’s how I incorporate multiple scatterplots into a single figure. Pivoting the data allows me to use faceting.

(
    df_archival
    >> gr.tf_select(
        "dMdtheta_0",
        "W",
        "d",
        "f_com"
    )
    >> gr.tf_pivot_longer(
        columns=["W", "d", "f_com"],
        names_to="var",
        values_to="value",
    )
    
    >> gr.ggplot(gr.aes("value", "dMdtheta_0"))
    + gr.geom_hline(yintercept=0, color="red")
    + gr.geom_point()
    + gr.geom_smooth(color="blue")
    + gr.facet_grid("~var", scales="free_x")
    + gr.theme_minimal()
)

../_images/cb480b161ea2f9fc23dc3e4b8c255c7868d54ec83782e26110a2082f6daf2378.png

<ggplot: (8760835732209)>

Plotting quantiles#

factor = 1/2
(
    df_archival
    
    >> gr.ggplot(gr.aes("W", "dMdtheta_0"))
    + gr.geom_ribbon(
        data=df_archival
        >> gr.tf_mutate(W_bin=gr.round(DF.W*factor, decimals=1)/factor)
        >> gr.tf_group_by(DF.W_bin)
        >> gr.tf_summarize(
            ymin=gr.quant(DF.dMdtheta_0, 0.25),
            dMdtheta_0=gr.mean(DF.dMdtheta_0),
            ymax=gr.quant(DF.dMdtheta_0, 0.75),
        ),
        mapping=gr.aes(x="W_bin", ymin="ymin", ymax="ymax"),
        alpha=2/4,
    )
    + gr.geom_point(size=0.2)
    + gr.theme_minimal()
)

../_images/bbca6043a8653140a834b1825ab79ceded3f379e67e7344eca52e436853cfc5b.png

<ggplot: (8760881143866)>

Targeted Sinew Plot#

Note that gr.pt_auto() calls other plotting functions with default arguments. We can call the same function with different arguments to tweak a visual. For instance, we can plot a smaller number of outputs for a more focused plot:

## Targeting a single output
(
    df_sinews
    >> gr.pt_sinew_outputs(
        var=["W", "H", "d", "f_com", "n"],
        out=["dMdtheta_0"],
    )
)

## Targeting a couple inputs
(
    df_sinews
    >> gr.pt_sinew_outputs(
        var=["d", "f_com"],
        out=["dMdtheta_0"],
    )
)

## Manually constructing a single sweep (over W)
(
    df_sinews
    >> gr.tf_filter(DF.sweep_var == "W")
    
    >> gr.ggplot(gr.aes("W", "dMdtheta_0"))
    + gr.geom_line(gr.aes(color="factor(sweep_ind)"))
)