{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "e83d474c",
   "metadata": {
    "tags": []
   },
   "source": [
    "# Data: Pivoting Data\n",
    "\n",
    "*Purpose*: Data is easiest to use when it is *tidy*. In fact, grama is specifically designed to use tidy data. But not all data we'll encounter is tidy! To that end, in this exercise we'll learn how to tidy our data by *pivoting*.\n",
    "\n",
    "As a result of learning how to quickly *tidy* data, you'll vastly expand the set of datasets you can analyze. Rather than fighting with data, you'll be able to quickly wrangle and extract insights.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f895c9a",
   "metadata": {},
   "source": [
    "## Setup\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "a2de1b4e",
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import grama as gr\n",
    "DF = gr.Intention()\n",
    "%matplotlib inline\n",
    "\n",
    "# For assertion\n",
    "from pandas.api.types import is_integer_dtype"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6e3e55d6",
   "metadata": {},
   "source": [
    "# Tidy Data\n",
    "\n",
    "*Tidy Data* is a very simple---but *very powerful*---concept for structuring a dataset. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "327c425e",
   "metadata": {},
   "source": [
    "![Stylized text providing an overview of Tidy Data. The top reads “Tidy data is a standard way of mapping the meaning of a dataset to its structure. - Hadley Wickham.” On the left reads “In tidy data: each variable forms a column; each observation forms a row; each cell is a single measurement.” There is an example table on the lower right with columns ‘id’, ‘name’ and ‘color’ with observations for different cats, illustrating tidy data structure.](./images/tidydata_1.jpg)\n",
    "Artwork by @allison_horst\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1e9dcb2f",
   "metadata": {},
   "source": [
    "## Tidy Data: Definition\n",
    "\n",
    "A tidy dataset has three properties:\n",
    "\n",
    "- each variable forms a column\n",
    "- each observation forms a row\n",
    "- each cell is a single measurement\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2096d57f",
   "metadata": {},
   "source": [
    "As an example, the following dataset is tidy:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "083e991a",
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>thick</th>\n",
       "      <th>alloy</th>\n",
       "      <th>E</th>\n",
       "      <th>mu</th>\n",
       "      <th>ang</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.022</td>\n",
       "      <td>al_24st</td>\n",
       "      <td>10600</td>\n",
       "      <td>0.321</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.022</td>\n",
       "      <td>al_24st</td>\n",
       "      <td>10600</td>\n",
       "      <td>0.323</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.032</td>\n",
       "      <td>al_24st</td>\n",
       "      <td>10400</td>\n",
       "      <td>0.329</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.032</td>\n",
       "      <td>al_24st</td>\n",
       "      <td>10300</td>\n",
       "      <td>0.319</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.064</td>\n",
       "      <td>al_24st</td>\n",
       "      <td>10500</td>\n",
       "      <td>0.323</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   thick     alloy      E     mu  ang\n",
       "0  0.022   al_24st  10600  0.321    0\n",
       "1  0.022   al_24st  10600  0.323    0\n",
       "2  0.032   al_24st  10400  0.329    0\n",
       "3  0.032   al_24st  10300  0.319    0\n",
       "4  0.064   al_24st  10500  0.323    0"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from grama.data import df_stang\n",
    "df_stang.head()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ad920c9e",
   "metadata": {},
   "source": [
    "The observations are all measured material properties taken at a particular angle (with respect to the direction in which the specimens were rolled). Each column reports values for just one variable, each row corresponds to a single observation, and every cell reports just one measurement.\n",
    "\n",
    "However, the following form of the same dataset is *not* tidy:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "8d9430ea",
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>thick</th>\n",
       "      <th>E_00</th>\n",
       "      <th>mu_00</th>\n",
       "      <th>E_45</th>\n",
       "      <th>mu_45</th>\n",
       "      <th>E_90</th>\n",
       "      <th>mu_90</th>\n",
       "      <th>alloy</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.022</td>\n",
       "      <td>10600</td>\n",
       "      <td>0.321</td>\n",
       "      <td>10700</td>\n",
       "      <td>0.329</td>\n",
       "      <td>10500</td>\n",
       "      <td>0.310</td>\n",
       "      <td>al_24st</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.022</td>\n",
       "      <td>10600</td>\n",
       "      <td>0.323</td>\n",
       "      <td>10500</td>\n",
       "      <td>0.331</td>\n",
       "      <td>10700</td>\n",
       "      <td>0.323</td>\n",
       "      <td>al_24st</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.032</td>\n",
       "      <td>10400</td>\n",
       "      <td>0.329</td>\n",
       "      <td>10400</td>\n",
       "      <td>0.318</td>\n",
       "      <td>10300</td>\n",
       "      <td>0.322</td>\n",
       "      <td>al_24st</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.032</td>\n",
       "      <td>10300</td>\n",
       "      <td>0.319</td>\n",
       "      <td>10500</td>\n",
       "      <td>0.326</td>\n",
       "      <td>10400</td>\n",
       "      <td>0.330</td>\n",
       "      <td>al_24st</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.064</td>\n",
       "      <td>10500</td>\n",
       "      <td>0.323</td>\n",
       "      <td>10400</td>\n",
       "      <td>0.331</td>\n",
       "      <td>10400</td>\n",
       "      <td>0.327</td>\n",
       "      <td>al_24st</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.064</td>\n",
       "      <td>10700</td>\n",
       "      <td>0.328</td>\n",
       "      <td>10500</td>\n",
       "      <td>0.328</td>\n",
       "      <td>10500</td>\n",
       "      <td>0.320</td>\n",
       "      <td>al_24st</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0.081</td>\n",
       "      <td>10000</td>\n",
       "      <td>0.315</td>\n",
       "      <td>10000</td>\n",
       "      <td>0.320</td>\n",
       "      <td>9900</td>\n",
       "      <td>0.314</td>\n",
       "      <td>al_24st</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0.081</td>\n",
       "      <td>10100</td>\n",
       "      <td>0.312</td>\n",
       "      <td>9900</td>\n",
       "      <td>0.312</td>\n",
       "      <td>10000</td>\n",
       "      <td>0.316</td>\n",
       "      <td>al_24st</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0.081</td>\n",
       "      <td>10000</td>\n",
       "      <td>0.311</td>\n",
       "      <td>-1</td>\n",
       "      <td>-1.000</td>\n",
       "      <td>9900</td>\n",
       "      <td>0.314</td>\n",
       "      <td>al_24st</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   thick   E_00  mu_00   E_45  mu_45   E_90  mu_90     alloy\n",
       "0  0.022  10600  0.321  10700  0.329  10500  0.310   al_24st\n",
       "1  0.022  10600  0.323  10500  0.331  10700  0.323   al_24st\n",
       "2  0.032  10400  0.329  10400  0.318  10300  0.322   al_24st\n",
       "3  0.032  10300  0.319  10500  0.326  10400  0.330   al_24st\n",
       "4  0.064  10500  0.323  10400  0.331  10400  0.327   al_24st\n",
       "5  0.064  10700  0.328  10500  0.328  10500  0.320   al_24st\n",
       "6  0.081  10000  0.315  10000  0.320   9900  0.314   al_24st\n",
       "7  0.081  10100  0.312   9900  0.312  10000  0.316   al_24st\n",
       "8  0.081  10000  0.311     -1 -1.000   9900  0.314   al_24st"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from grama.data import df_stang_wide\n",
    "df_stang_wide"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a827b07d",
   "metadata": {},
   "source": [
    "This dataset is *not* tidy: The angle of each measurement `00, 45, 90` is a variable, but these numerical values are expressed as column names. Put differently, some of the values are not in cells, but rather in the column names.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6224b58e",
   "metadata": {},
   "source": [
    "## Why tidy data?\n",
    "\n",
    "Tidy data makes analysis *easier*. Putting our data in tidy form means we can use a *consistent* set of tools to work with *any* dataset.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "db8edfa0",
   "metadata": {},
   "source": [
    "![On the left is a happy cute fuzzy monster holding a rectangular data frame with a tool that fits the data frame shape. On the workbench behind the monster are other data frames of similar rectangular shape, and neatly arranged tools that also look like they would fit those data frames. The workbench looks uncluttered and tidy. The text above the tidy workbench reads “When working with tidy data, we can use the same tools in similar ways for different datasets…” On the right is a cute monster looking very frustrated, using duct tape and other tools to haphazardly tie data tables together, each in a different way. The monster is in front of a messy, cluttered workbench. The text above the frustrated monster reads “...but working with untidy data often means reinventing the wheel with one-time approaches that are hard to iterate or reuse.”](./images/tidydata_3.jpg)\n",
    "Artwork by @allison_horst\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "75529e7f",
   "metadata": {},
   "source": [
    "Note that untidy data is not *bad* data; untidy data are simply harder to work with when doing data analysis. Data often come in untidy form when they are reported, say in a paper or a presentation. For instance, the wide form of the Stang et al. dataset can easily fit on one page:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "d7954242",
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>thick</th>\n",
       "      <th>E_00</th>\n",
       "      <th>mu_00</th>\n",
       "      <th>E_45</th>\n",
       "      <th>mu_45</th>\n",
       "      <th>E_90</th>\n",
       "      <th>mu_90</th>\n",
       "      <th>alloy</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.022</td>\n",
       "      <td>10600</td>\n",
       "      <td>0.321</td>\n",
       "      <td>10700</td>\n",
       "      <td>0.329</td>\n",
       "      <td>10500</td>\n",
       "      <td>0.310</td>\n",
       "      <td>al_24st</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.022</td>\n",
       "      <td>10600</td>\n",
       "      <td>0.323</td>\n",
       "      <td>10500</td>\n",
       "      <td>0.331</td>\n",
       "      <td>10700</td>\n",
       "      <td>0.323</td>\n",
       "      <td>al_24st</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.032</td>\n",
       "      <td>10400</td>\n",
       "      <td>0.329</td>\n",
       "      <td>10400</td>\n",
       "      <td>0.318</td>\n",
       "      <td>10300</td>\n",
       "      <td>0.322</td>\n",
       "      <td>al_24st</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.032</td>\n",
       "      <td>10300</td>\n",
       "      <td>0.319</td>\n",
       "      <td>10500</td>\n",
       "      <td>0.326</td>\n",
       "      <td>10400</td>\n",
       "      <td>0.330</td>\n",
       "      <td>al_24st</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.064</td>\n",
       "      <td>10500</td>\n",
       "      <td>0.323</td>\n",
       "      <td>10400</td>\n",
       "      <td>0.331</td>\n",
       "      <td>10400</td>\n",
       "      <td>0.327</td>\n",
       "      <td>al_24st</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.064</td>\n",
       "      <td>10700</td>\n",
       "      <td>0.328</td>\n",
       "      <td>10500</td>\n",
       "      <td>0.328</td>\n",
       "      <td>10500</td>\n",
       "      <td>0.320</td>\n",
       "      <td>al_24st</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0.081</td>\n",
       "      <td>10000</td>\n",
       "      <td>0.315</td>\n",
       "      <td>10000</td>\n",
       "      <td>0.320</td>\n",
       "      <td>9900</td>\n",
       "      <td>0.314</td>\n",
       "      <td>al_24st</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0.081</td>\n",
       "      <td>10100</td>\n",
       "      <td>0.312</td>\n",
       "      <td>9900</td>\n",
       "      <td>0.312</td>\n",
       "      <td>10000</td>\n",
       "      <td>0.316</td>\n",
       "      <td>al_24st</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0.081</td>\n",
       "      <td>10000</td>\n",
       "      <td>0.311</td>\n",
       "      <td>-1</td>\n",
       "      <td>-1.000</td>\n",
       "      <td>9900</td>\n",
       "      <td>0.314</td>\n",
       "      <td>al_24st</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   thick   E_00  mu_00   E_45  mu_45   E_90  mu_90     alloy\n",
       "0  0.022  10600  0.321  10700  0.329  10500  0.310   al_24st\n",
       "1  0.022  10600  0.323  10500  0.331  10700  0.323   al_24st\n",
       "2  0.032  10400  0.329  10400  0.318  10300  0.322   al_24st\n",
       "3  0.032  10300  0.319  10500  0.326  10400  0.330   al_24st\n",
       "4  0.064  10500  0.323  10400  0.331  10400  0.327   al_24st\n",
       "5  0.064  10700  0.328  10500  0.328  10500  0.320   al_24st\n",
       "6  0.081  10000  0.315  10000  0.320   9900  0.314   al_24st\n",
       "7  0.081  10100  0.312   9900  0.312  10000  0.316   al_24st\n",
       "8  0.081  10000  0.311     -1 -1.000   9900  0.314   al_24st"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_stang_wide"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a8b7ebd0",
   "metadata": {},
   "source": [
    "However, the tidy form of the same dataset is far less compact:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "dc13f694",
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>thick</th>\n",
       "      <th>alloy</th>\n",
       "      <th>E</th>\n",
       "      <th>mu</th>\n",
       "      <th>ang</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.022</td>\n",
       "      <td>al_24st</td>\n",
       "      <td>10600</td>\n",
       "      <td>0.321</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.022</td>\n",
       "      <td>al_24st</td>\n",
       "      <td>10600</td>\n",
       "      <td>0.323</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.032</td>\n",
       "      <td>al_24st</td>\n",
       "      <td>10400</td>\n",
       "      <td>0.329</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.032</td>\n",
       "      <td>al_24st</td>\n",
       "      <td>10300</td>\n",
       "      <td>0.319</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.064</td>\n",
       "      <td>al_24st</td>\n",
       "      <td>10500</td>\n",
       "      <td>0.323</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>71</th>\n",
       "      <td>0.064</td>\n",
       "      <td>al_24st</td>\n",
       "      <td>10400</td>\n",
       "      <td>0.327</td>\n",
       "      <td>90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>72</th>\n",
       "      <td>0.064</td>\n",
       "      <td>al_24st</td>\n",
       "      <td>10500</td>\n",
       "      <td>0.320</td>\n",
       "      <td>90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>73</th>\n",
       "      <td>0.081</td>\n",
       "      <td>al_24st</td>\n",
       "      <td>9900</td>\n",
       "      <td>0.314</td>\n",
       "      <td>90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74</th>\n",
       "      <td>0.081</td>\n",
       "      <td>al_24st</td>\n",
       "      <td>10000</td>\n",
       "      <td>0.316</td>\n",
       "      <td>90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75</th>\n",
       "      <td>0.081</td>\n",
       "      <td>al_24st</td>\n",
       "      <td>9900</td>\n",
       "      <td>0.314</td>\n",
       "      <td>90</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>76 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    thick     alloy      E     mu  ang\n",
       "0   0.022   al_24st  10600  0.321    0\n",
       "1   0.022   al_24st  10600  0.323    0\n",
       "2   0.032   al_24st  10400  0.329    0\n",
       "3   0.032   al_24st  10300  0.319    0\n",
       "4   0.064   al_24st  10500  0.323    0\n",
       "..    ...       ...    ...    ...  ...\n",
       "71  0.064   al_24st  10400  0.327   90\n",
       "72  0.064   al_24st  10500  0.320   90\n",
       "73  0.081   al_24st   9900  0.314   90\n",
       "74  0.081   al_24st  10000  0.316   90\n",
       "75  0.081   al_24st   9900  0.314   90\n",
       "\n",
       "[76 rows x 5 columns]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_stang"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fac124db",
   "metadata": {},
   "source": [
    "## Exercises\n",
    "\n",
    "Let's practice identifying tidy and untidy data.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6cc04c9",
   "metadata": {},
   "source": [
    "### __q1__ Identify\n",
    "\n",
    "Inspect the following dataset; answer the questions under *observations* below.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "f9a3fe5b",
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>year2011</th>\n",
       "      <th>year2012</th>\n",
       "      <th>year2013</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>FR</td>\n",
       "      <td>7000</td>\n",
       "      <td>6900</td>\n",
       "      <td>7000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>DE</td>\n",
       "      <td>5800</td>\n",
       "      <td>6000</td>\n",
       "      <td>6200</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>US</td>\n",
       "      <td>15000</td>\n",
       "      <td>14000</td>\n",
       "      <td>13000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  country  year2011  year2012  year2013\n",
       "0      FR      7000      6900      7000\n",
       "1      DE      5800      6000      6200\n",
       "2      US     15000     14000     13000"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## TASK: No need to edit; run and inspect\n",
    "df_cases = gr.df_make(\n",
    "    country=[\"FR\", \"DE\", \"US\"],\n",
    "    year2011=[7000, 5800, 15000],\n",
    "    year2012=[6900, 6000, 14000],\n",
    "    year2013=[7000, 6200, 13000],\n",
    ")\n",
    "df_cases\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e83ee3f0",
   "metadata": {},
   "source": [
    "*Observations*\n",
    "\n",
    "- What are the *variables* in this dataset?\n",
    "  - Country, year, and some unknown quantity (n, count, etc.)\n",
    "- Is this dataset *tidy*? Why or why not?\n",
    "  - No; the year values are in the column names.\n",
    "<!-- solution-end -->"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f75cd1f4",
   "metadata": {},
   "source": [
    "### __q2__ Identify\n",
    "\n",
    "Inspect the following dataset; answer the questions under *observations* below.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "e8145fbc",
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>thick</th>\n",
       "      <th>E_00</th>\n",
       "      <th>mu_00</th>\n",
       "      <th>E_45</th>\n",
       "      <th>mu_45</th>\n",
       "      <th>rep</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.022</td>\n",
       "      <td>10600</td>\n",
       "      <td>0.321</td>\n",
       "      <td>10700</td>\n",
       "      <td>0.329</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.022</td>\n",
       "      <td>10600</td>\n",
       "      <td>0.323</td>\n",
       "      <td>10500</td>\n",
       "      <td>0.331</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.032</td>\n",
       "      <td>10400</td>\n",
       "      <td>0.329</td>\n",
       "      <td>10400</td>\n",
       "      <td>0.318</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.032</td>\n",
       "      <td>10300</td>\n",
       "      <td>0.319</td>\n",
       "      <td>10500</td>\n",
       "      <td>0.326</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   thick   E_00  mu_00   E_45  mu_45  rep\n",
       "0  0.022  10600  0.321  10700  0.329    1\n",
       "1  0.022  10600  0.323  10500  0.331    2\n",
       "2  0.032  10400  0.329  10400  0.318    1\n",
       "3  0.032  10300  0.319  10500  0.326    2"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## TASK: No need to edit; run and inspect\n",
    "df_alloys1 = gr.df_make(\n",
    "    thick=[0.022, 0.022, 0.032, 0.032],\n",
    "    E_00=[10600, 10600, 10400, 10300],\n",
    "    mu_00=[0.321, 0.323, 0.329, 0.319],\n",
    "    E_45=[10700, 10500, 10400, 10500],\n",
    "    mu_45=[0.329, 0.331, 0.318, 0.326],\n",
    "    rep=[1, 2, 1, 2],\n",
    ")\n",
    "df_alloys1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "56013aad",
   "metadata": {},
   "source": [
    "*Observations*\n",
    "\n",
    "- What are the *variables* in this dataset?\n",
    "  - Thickness `thick`, elasticity `E`, poisson's ratio `mu`, angle (in column names), replication `rep`\n",
    "- Is this dataset *tidy*? Why or why not?\n",
    "  - No; the angle values are in the column names.\n",
    "<!-- solution-end -->"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dc834b12",
   "metadata": {},
   "source": [
    "### __q3__ Identify\n",
    "\n",
    "Inspect the following dataset; answer the questions under *observations* below.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "10700eb4",
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>thick</th>\n",
       "      <th>var</th>\n",
       "      <th>value</th>\n",
       "      <th>rep</th>\n",
       "      <th>angle</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.022</td>\n",
       "      <td>E</td>\n",
       "      <td>10700.000</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.022</td>\n",
       "      <td>mu</td>\n",
       "      <td>0.321</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.032</td>\n",
       "      <td>E</td>\n",
       "      <td>10500.000</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.032</td>\n",
       "      <td>mu</td>\n",
       "      <td>0.323</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   thick var      value  rep  angle\n",
       "0  0.022   E  10700.000    1      0\n",
       "1  0.022  mu      0.321    1      0\n",
       "2  0.032   E  10500.000    2      0\n",
       "3  0.032  mu      0.323    2      0"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## TASK: No need to edit; run and inspect\n",
    "df_alloys2 = gr.df_make(\n",
    "    thick=[0.022, 0.022, 0.032, 0.032],\n",
    "    var=[\"E\", \"mu\", \"E\", \"mu\"],\n",
    "    value=[10700, 0.321, 10500, 0.323],\n",
    "    rep=[1, 1, 2, 2],\n",
    "    angle=[0, 0, 0, 0],\n",
    ")\n",
    "df_alloys2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "551295cc",
   "metadata": {},
   "source": [
    "*Observations*\n",
    "\n",
    "- What are the *variables* in this dataset?\n",
    "  - Thickness `thick`, elasticity `E`, poisson's ratio `mu`, `angle`, replication `rep`\n",
    "- Is this dataset *tidy*? Why or why not?\n",
    "  - No; the column `value` contains values of two *different* variables `E` and `mu`.\n",
    "<!-- solution-end -->"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2be971b3",
   "metadata": {},
   "source": [
    "# Pivoting Data\n",
    "\n",
    "The untidy datasets above fail to be tidy because they have the wrong *shape*; we can tidy these datasets by *pivoting* the data. There are just two pivots we need to learn about: `tf_pivot_longer()` and `tf_pivot_wider()`.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "09b93698",
   "metadata": {},
   "source": [
    "## Pivot Longer\n",
    "\n",
    "Let's take another look at the `df_cases` example dataset:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "12df95e8",
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>year2011</th>\n",
       "      <th>year2012</th>\n",
       "      <th>year2013</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>FR</td>\n",
       "      <td>7000</td>\n",
       "      <td>6900</td>\n",
       "      <td>7000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>DE</td>\n",
       "      <td>5800</td>\n",
       "      <td>6000</td>\n",
       "      <td>6200</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>US</td>\n",
       "      <td>15000</td>\n",
       "      <td>14000</td>\n",
       "      <td>13000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  country  year2011  year2012  year2013\n",
       "0      FR      7000      6900      7000\n",
       "1      DE      5800      6000      6200\n",
       "2      US     15000     14000     13000"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## NOTE: No need to edit\n",
    "df_cases"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b8cd68e5",
   "metadata": {},
   "source": [
    "This dataset is *too wide*; the column names `year2011,year2012,year2013` should really be numbers inside a single `year` column. We can *pivot longer* to move from a wide structure to a longer one:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "c0b87da0",
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>year</th>\n",
       "      <th>count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>FR</td>\n",
       "      <td>year2011</td>\n",
       "      <td>7000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>DE</td>\n",
       "      <td>year2011</td>\n",
       "      <td>5800</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>US</td>\n",
       "      <td>year2011</td>\n",
       "      <td>15000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>FR</td>\n",
       "      <td>year2012</td>\n",
       "      <td>6900</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>DE</td>\n",
       "      <td>year2012</td>\n",
       "      <td>6000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>US</td>\n",
       "      <td>year2012</td>\n",
       "      <td>14000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>FR</td>\n",
       "      <td>year2013</td>\n",
       "      <td>7000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>DE</td>\n",
       "      <td>year2013</td>\n",
       "      <td>6200</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>US</td>\n",
       "      <td>year2013</td>\n",
       "      <td>13000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  country      year  count\n",
       "0      FR  year2011   7000\n",
       "1      DE  year2011   5800\n",
       "2      US  year2011  15000\n",
       "3      FR  year2012   6900\n",
       "4      DE  year2012   6000\n",
       "5      US  year2012  14000\n",
       "6      FR  year2013   7000\n",
       "7      DE  year2013   6200\n",
       "8      US  year2013  13000"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## NOTE: No need to edit\n",
    "(\n",
    "    df_cases\n",
    "    >> gr.tf_pivot_longer(\n",
    "        columns=[\"year2011\", \"year2012\", \"year2013\"],\n",
    "        names_to=\"year\",\n",
    "        values_to=\"count\",\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9dcfffe3",
   "metadata": {},
   "source": [
    "Note what we had to provide as arguments to `gr.tf_pivot_longer()`:\n",
    "\n",
    "- `columns` specifies the columns to involve in the pivoting\n",
    "- `names_to` specifies what new column will contain the previous column names\n",
    "- `value_to` specifies what new column will contain the previous cell values\n",
    "\n",
    "Often, a pivot alone is not enough to fully clean a dataset; for instance, we would still need to remove the `year` string from each cell entry. However, we can do this much more easily once the data are pivoted into a longer format:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "08768fbb",
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>year</th>\n",
       "      <th>count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>FR</td>\n",
       "      <td>2011</td>\n",
       "      <td>7000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>DE</td>\n",
       "      <td>2011</td>\n",
       "      <td>5800</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>US</td>\n",
       "      <td>2011</td>\n",
       "      <td>15000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>FR</td>\n",
       "      <td>2012</td>\n",
       "      <td>6900</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>DE</td>\n",
       "      <td>2012</td>\n",
       "      <td>6000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>US</td>\n",
       "      <td>2012</td>\n",
       "      <td>14000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>FR</td>\n",
       "      <td>2013</td>\n",
       "      <td>7000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>DE</td>\n",
       "      <td>2013</td>\n",
       "      <td>6200</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>US</td>\n",
       "      <td>2013</td>\n",
       "      <td>13000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  country  year  count\n",
       "0      FR  2011   7000\n",
       "1      DE  2011   5800\n",
       "2      US  2011  15000\n",
       "3      FR  2012   6900\n",
       "4      DE  2012   6000\n",
       "5      US  2012  14000\n",
       "6      FR  2013   7000\n",
       "7      DE  2013   6200\n",
       "8      US  2013  13000"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## NOTE: No need to edit\n",
    "(\n",
    "    df_cases\n",
    "    >> gr.tf_pivot_longer(\n",
    "        columns=[\"year2011\", \"year2012\", \"year2013\"],\n",
    "        names_to=\"year\",\n",
    "        values_to=\"count\",\n",
    "    )\n",
    "    >> gr.tf_mutate(year=gr.str_replace(DF.year, \"year\", \"\"))\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4e1c0d74",
   "metadata": {},
   "source": [
    "### __q4__ Pivot the data\n",
    "\n",
    "Pivot the dataset `df_alloys1` to place each of `E_00, mu_00, E_45, mu_45` as values in a new `variable` column, and move the original values to a new `value` column. Answer the questions under *observations* below.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "8eeffa51",
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "    thick  rep variable      value\n",
      "0   0.022    1     E_00  10600.000\n",
      "1   0.022    2     E_00  10600.000\n",
      "2   0.032    1     E_00  10400.000\n",
      "3   0.032    2     E_00  10300.000\n",
      "4   0.022    1    mu_00      0.321\n",
      "5   0.022    2    mu_00      0.323\n",
      "6   0.032    1    mu_00      0.329\n",
      "7   0.032    2    mu_00      0.319\n",
      "8   0.022    1     E_45  10700.000\n",
      "9   0.022    2     E_45  10500.000\n",
      "10  0.032    1     E_45  10400.000\n",
      "11  0.032    2     E_45  10500.000\n",
      "12  0.022    1    mu_45      0.329\n",
      "13  0.022    2    mu_45      0.331\n",
      "14  0.032    1    mu_45      0.318\n",
      "15  0.032    2    mu_45      0.326\n"
     ]
    }
   ],
   "source": [
    "## TASK: Tidy the data, make sure to provide `variable` and `value` columns\n",
    "df_res1 = (\n",
    "    df_alloys1\n",
    "\n",
    "    >> gr.tf_pivot_longer(\n",
    "        columns=[\"E_00\", \"mu_00\", \"E_45\", \"mu_45\"],\n",
    "        names_to=\"variable\",\n",
    "        values_to=\"value\",\n",
    "    )\n",
    ")\n",
    "\n",
    "## NOTE: Use this to check your work\n",
    "print(df_res1)\n",
    "\n",
    "assert \\\n",
    "    {\"thick\", \"rep\", \"variable\", \"value\"} == set(df_res1.columns), \\\n",
    "    'columns of df_res1 are not [\"thick\", \"rep\", \"variable\", \"value\"]'\n",
    "\n",
    "assert \\\n",
    "    {\"E_00\", \"mu_00\", \"E_45\", \"mu_45\"} == set(df_res1.variable), \\\n",
    "    'Entries in df_res1.variable are incorrect'\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "55fd504c",
   "metadata": {},
   "source": [
    "*Observations*\n",
    "\n",
    "- Is this dataset tidy? Why or why not?\n",
    "  - This dataset is not tidy; the entries in the `value` column include values of `E` and `mu` (two different variables). Also, the `variable` column contains both variable names and angle values.\n",
    "<!-- solution-end -->"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "325d594c",
   "metadata": {},
   "source": [
    "### Using selection helpers\n",
    "\n",
    "One useful feature of `gr.tf_pivot_longer()` is that we can use [selection helpers](https://zdelrosario.github.io/evc-course/exercises_solution/d02-e-data01-isolate-solution.html#selection-helpers) to choose which columns to use in a pivot. This can save us a lot of typing, and allows us to write code that can react to multiple different datasets. Let's practice using selection helpers by re-doing the previous task.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4a61a47b",
   "metadata": {},
   "source": [
    "### __q5__ Simplify with selection helpers\n",
    "\n",
    "Use a single [selection helper](https://zdelrosario.github.io/evc-course/exercises_solution/d02-e-data01-isolate-solution.html#selection-helpers) to simplify the `columns` argument to `gr.tf_pivot_longer()`.\n",
    "\n",
    "*Hint*: There are multiple ways to do this: The string `\"E|mu\"` will `match` the strings `\"E\"` or `\"mu\"`. The string `\"\\\\d+\"` will `match` *any string of consecutive digits*.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "598055cf",
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "    thick  rep variable      value\n",
      "0   0.022    1     E_00  10600.000\n",
      "1   0.022    2     E_00  10600.000\n",
      "2   0.032    1     E_00  10400.000\n",
      "3   0.032    2     E_00  10300.000\n",
      "4   0.022    1    mu_00      0.321\n",
      "5   0.022    2    mu_00      0.323\n",
      "6   0.032    1    mu_00      0.329\n",
      "7   0.032    2    mu_00      0.319\n",
      "8   0.022    1     E_45  10700.000\n",
      "9   0.022    2     E_45  10500.000\n",
      "10  0.032    1     E_45  10400.000\n",
      "11  0.032    2     E_45  10500.000\n",
      "12  0.022    1    mu_45      0.329\n",
      "13  0.022    2    mu_45      0.331\n",
      "14  0.032    1    mu_45      0.318\n",
      "15  0.032    2    mu_45      0.326\n"
     ]
    }
   ],
   "source": [
    "## TASK: Replace the `columns` argument with a selection helper\n",
    "df_simplified = (\n",
    "    df_alloys1\n",
    "    >> gr.tf_pivot_longer(\n",
    "\n",
    "        columns=gr.matches(\"E|mu\"), \n",
    "        names_to=\"variable\",\n",
    "        values_to=\"value\",\n",
    "    )\n",
    ")\n",
    "\n",
    "## NOTE: Use this to check your work\n",
    "print(df_simplified)\n",
    "\n",
    "assert \\\n",
    "    {\"thick\", \"rep\", \"variable\", \"value\"} == set(df_simplified.columns), \\\n",
    "    'columns of df_res1 are not [\"thick\", \"rep\", \"variable\", \"value\"]'\n",
    "\n",
    "assert \\\n",
    "    {\"E_00\", \"mu_00\", \"E_45\", \"mu_45\"} == set(df_simplified.variable), \\\n",
    "    'Entries in df_res1.variable are incorrect'\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9e20c94e",
   "metadata": {},
   "source": [
    "## Pivot Wider\n",
    "\n",
    "Just as a dataset can be \"too wide\", it can also be \"too long.\" In this case, we can use `gr.tf_pivot_wider()`. Let's look at an illustrative example:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "3a9dc621",
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>variable</th>\n",
       "      <th>value</th>\n",
       "      <th>observation</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>x</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>y</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>f</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>x</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>y</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>f</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  variable  value  observation\n",
       "0        x      0            0\n",
       "1        y      0            0\n",
       "2        f      1            0\n",
       "3        x      0            1\n",
       "4        y      1            1\n",
       "5        f      0            1"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## NOTE: No need to edit\n",
    "df_long = gr.df_make(\n",
    "    variable=[\"x\", \"y\", \"f\", \"x\", \"y\", \"f\"],\n",
    "    value=[0, 0, 1, 0, 1, 0],\n",
    "    observation=[0, 0, 0, 1, 1, 1],\n",
    ")\n",
    "df_long\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "55be108f",
   "metadata": {},
   "source": [
    "Here, the `value` column has mixed variables; values of `x`, `y`, and `f` comingle in the `value` column. Really we should have the *columns* `x`, `y`, and `f`: In this sense the data is *too long*. Let's pivot wider to tidy the data.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "b919286c",
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>observation</th>\n",
       "      <th>f</th>\n",
       "      <th>x</th>\n",
       "      <th>y</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   observation  f  x  y\n",
       "0            0  1  0  0\n",
       "1            1  0  0  1"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## NOTE: No need to edit\n",
    "(\n",
    "    df_long\n",
    "    >> gr.tf_pivot_wider(\n",
    "        names_from=\"variable\",\n",
    "        values_from=\"value\",\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e747affc",
   "metadata": {},
   "source": [
    "Note that we had to provide just two inputs to `gr.tf_pivot_wider()`:\n",
    "\n",
    "- `names_from` specifies which column will provide the new column names\n",
    "- `value_from` specifies which column will provide the new column values\n",
    "\n",
    "Let's get some practice!\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ad0074c9",
   "metadata": {},
   "source": [
    "### __q6__ Pivot the data\n",
    "\n",
    "Pivot the dataset `df_alloys2` to provide `E` and `mu` as columns in `df_res2`. Answer the questions under *observations* below.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "54359a2f",
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   thick  rep  angle        E     mu\n",
      "0  0.022    1      0  10700.0  0.321\n",
      "1  0.032    2      0  10500.0  0.323\n"
     ]
    }
   ],
   "source": [
    "## TASK: Pivot the dataset wider\n",
    "df_res2 = (\n",
    "    df_alloys2\n",
    "    >> gr.tf_pivot_wider(\n",
    "        names_from=\"var\",\n",
    "        values_from=\"value\",\n",
    "    )\n",
    ")\n",
    "\n",
    "## NOTE: Use the following to check your work\n",
    "print(df_res2)\n",
    "\n",
    "assert \\\n",
    "    {\"thick\", \"rep\", \"angle\", \"E\", \"mu\"} == set(df_res2.columns), \\\n",
    "    'df_res2 does not have the columns [\"thick\", \"rep\", \"angle\", \"E\", \"mu\"]'\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "30685938",
   "metadata": {},
   "source": [
    "*Observations*\n",
    "\n",
    "- Is this dataset tidy? Why or why not?\n",
    "  - This dataset is tidy! Every column is one variable, every row is one observation, and every cell contains one value.\n",
    "<!-- solution-end -->"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7edea2c5",
   "metadata": {},
   "source": [
    "### Warning: No observation identifier\n",
    "\n",
    "While the arguments to `gr.tf_pivot_wider()` only target two columns, we *do* need to think carefully about what other columns are in the dataset. It is important that the other columns help identify which values are associated with the same observation---the easiest way to do this is with some sort of observation identifier column.\n",
    "\n",
    "Let's take a look at what happens when we *do not* have a way to associate values with the same observation:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "dc37c5f0",
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>f</th>\n",
       "      <th>x</th>\n",
       "      <th>y</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     f    x    y\n",
       "0  NaN  0.0  NaN\n",
       "1  NaN  NaN  0.0\n",
       "2  1.0  NaN  NaN\n",
       "3  NaN  0.0  NaN\n",
       "4  NaN  NaN  1.0\n",
       "5  0.0  NaN  NaN"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## NOTE: No need to edit\n",
    "(\n",
    "    gr.df_make(\n",
    "        variable=[\"x\", \"y\", \"f\", \"x\", \"y\", \"f\"],\n",
    "        value=[0, 0, 1, 0, 1, 0],\n",
    "        # observation=[0, 0, 0, 1, 1, 1], # Remove the observation identifier\n",
    "    )\n",
    "    >> gr.tf_pivot_wider(\n",
    "        names_from=\"variable\",\n",
    "        values_from=\"value\",\n",
    "    )\n",
    ")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "750cd93f",
   "metadata": {},
   "source": [
    "Note that `gr.tf_pivot_wider()` placed all of the values in the \"correct\" column, but filled all over column values with `NaN`s. If you get strange behavior with `gr.tf_pivot_wider()`, you should think about whether you are providing the information necessary to associate common values with the same observation. We'll learn more about this in the next exercise.\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}