{
"cells": [
{
"cell_type": "markdown",
"id": "76d6d15d",
"metadata": {},
"source": [
"# Data: Isolating Data\n",
"\n",
"*Purpose*: One of the keys to a successful analysis is the ability to *focus* on particular topics. When analyzing a dataset, our ability to focus is tied to our facility at *isolating data*. In this exercise, you will practice isolating columns with `tf_select()`, picking specific rows with `tf_filter()`, and sorting your data with `tf_arrange()` to see what rises to the top.\n",
"\n",
"*Aside*: The data-management verbs in grama are heavily inspired by the [dplyr](https://dplyr.tidyverse.org/) package in the R programming langauge.\n"
]
},
{
"cell_type": "markdown",
"id": "81eff383",
"metadata": {},
"source": [
"## Setup\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "f566ab97",
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import grama as gr\n",
"DF = gr.Intention()\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"id": "e187ef54",
"metadata": {},
"source": [
"We'll use the `nycflights13` package in this exercise: This is a dataset of flights involving the New York City area during 2013.\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "c4a7b3e0",
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" year | \n",
" month | \n",
" day | \n",
" dep_time | \n",
" sched_dep_time | \n",
" dep_delay | \n",
" arr_time | \n",
" sched_arr_time | \n",
" arr_delay | \n",
" carrier | \n",
" flight | \n",
" tailnum | \n",
" origin | \n",
" dest | \n",
" air_time | \n",
" distance | \n",
" hour | \n",
" minute | \n",
" time_hour | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 517.0 | \n",
" 515 | \n",
" 2.0 | \n",
" 830.0 | \n",
" 819 | \n",
" 11.0 | \n",
" UA | \n",
" 1545 | \n",
" N14228 | \n",
" EWR | \n",
" IAH | \n",
" 227.0 | \n",
" 1400 | \n",
" 5 | \n",
" 15 | \n",
" 2013-01-01T10:00:00Z | \n",
"
\n",
" \n",
" 1 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 533.0 | \n",
" 529 | \n",
" 4.0 | \n",
" 850.0 | \n",
" 830 | \n",
" 20.0 | \n",
" UA | \n",
" 1714 | \n",
" N24211 | \n",
" LGA | \n",
" IAH | \n",
" 227.0 | \n",
" 1416 | \n",
" 5 | \n",
" 29 | \n",
" 2013-01-01T10:00:00Z | \n",
"
\n",
" \n",
" 2 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 542.0 | \n",
" 540 | \n",
" 2.0 | \n",
" 923.0 | \n",
" 850 | \n",
" 33.0 | \n",
" AA | \n",
" 1141 | \n",
" N619AA | \n",
" JFK | \n",
" MIA | \n",
" 160.0 | \n",
" 1089 | \n",
" 5 | \n",
" 40 | \n",
" 2013-01-01T10:00:00Z | \n",
"
\n",
" \n",
" 3 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 544.0 | \n",
" 545 | \n",
" -1.0 | \n",
" 1004.0 | \n",
" 1022 | \n",
" -18.0 | \n",
" B6 | \n",
" 725 | \n",
" N804JB | \n",
" JFK | \n",
" BQN | \n",
" 183.0 | \n",
" 1576 | \n",
" 5 | \n",
" 45 | \n",
" 2013-01-01T10:00:00Z | \n",
"
\n",
" \n",
" 4 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 554.0 | \n",
" 600 | \n",
" -6.0 | \n",
" 812.0 | \n",
" 837 | \n",
" -25.0 | \n",
" DL | \n",
" 461 | \n",
" N668DN | \n",
" LGA | \n",
" ATL | \n",
" 116.0 | \n",
" 762 | \n",
" 6 | \n",
" 0 | \n",
" 2013-01-01T11:00:00Z | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 336771 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" NaN | \n",
" 1455 | \n",
" NaN | \n",
" NaN | \n",
" 1634 | \n",
" NaN | \n",
" 9E | \n",
" 3393 | \n",
" NaN | \n",
" JFK | \n",
" DCA | \n",
" NaN | \n",
" 213 | \n",
" 14 | \n",
" 55 | \n",
" 2013-09-30T18:00:00Z | \n",
"
\n",
" \n",
" 336772 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" NaN | \n",
" 2200 | \n",
" NaN | \n",
" NaN | \n",
" 2312 | \n",
" NaN | \n",
" 9E | \n",
" 3525 | \n",
" NaN | \n",
" LGA | \n",
" SYR | \n",
" NaN | \n",
" 198 | \n",
" 22 | \n",
" 0 | \n",
" 2013-10-01T02:00:00Z | \n",
"
\n",
" \n",
" 336773 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" NaN | \n",
" 1210 | \n",
" NaN | \n",
" NaN | \n",
" 1330 | \n",
" NaN | \n",
" MQ | \n",
" 3461 | \n",
" N535MQ | \n",
" LGA | \n",
" BNA | \n",
" NaN | \n",
" 764 | \n",
" 12 | \n",
" 10 | \n",
" 2013-09-30T16:00:00Z | \n",
"
\n",
" \n",
" 336774 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" NaN | \n",
" 1159 | \n",
" NaN | \n",
" NaN | \n",
" 1344 | \n",
" NaN | \n",
" MQ | \n",
" 3572 | \n",
" N511MQ | \n",
" LGA | \n",
" CLE | \n",
" NaN | \n",
" 419 | \n",
" 11 | \n",
" 59 | \n",
" 2013-09-30T15:00:00Z | \n",
"
\n",
" \n",
" 336775 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" NaN | \n",
" 840 | \n",
" NaN | \n",
" NaN | \n",
" 1020 | \n",
" NaN | \n",
" MQ | \n",
" 3531 | \n",
" N839MQ | \n",
" LGA | \n",
" RDU | \n",
" NaN | \n",
" 431 | \n",
" 8 | \n",
" 40 | \n",
" 2013-09-30T12:00:00Z | \n",
"
\n",
" \n",
"
\n",
"
336776 rows × 19 columns
\n",
"
"
],
"text/plain": [
" year month day dep_time sched_dep_time dep_delay arr_time \\\n",
"0 2013 1 1 517.0 515 2.0 830.0 \n",
"1 2013 1 1 533.0 529 4.0 850.0 \n",
"2 2013 1 1 542.0 540 2.0 923.0 \n",
"3 2013 1 1 544.0 545 -1.0 1004.0 \n",
"4 2013 1 1 554.0 600 -6.0 812.0 \n",
"... ... ... ... ... ... ... ... \n",
"336771 2013 9 30 NaN 1455 NaN NaN \n",
"336772 2013 9 30 NaN 2200 NaN NaN \n",
"336773 2013 9 30 NaN 1210 NaN NaN \n",
"336774 2013 9 30 NaN 1159 NaN NaN \n",
"336775 2013 9 30 NaN 840 NaN NaN \n",
"\n",
" sched_arr_time arr_delay carrier flight tailnum origin dest \\\n",
"0 819 11.0 UA 1545 N14228 EWR IAH \n",
"1 830 20.0 UA 1714 N24211 LGA IAH \n",
"2 850 33.0 AA 1141 N619AA JFK MIA \n",
"3 1022 -18.0 B6 725 N804JB JFK BQN \n",
"4 837 -25.0 DL 461 N668DN LGA ATL \n",
"... ... ... ... ... ... ... ... \n",
"336771 1634 NaN 9E 3393 NaN JFK DCA \n",
"336772 2312 NaN 9E 3525 NaN LGA SYR \n",
"336773 1330 NaN MQ 3461 N535MQ LGA BNA \n",
"336774 1344 NaN MQ 3572 N511MQ LGA CLE \n",
"336775 1020 NaN MQ 3531 N839MQ LGA RDU \n",
"\n",
" air_time distance hour minute time_hour \n",
"0 227.0 1400 5 15 2013-01-01T10:00:00Z \n",
"1 227.0 1416 5 29 2013-01-01T10:00:00Z \n",
"2 160.0 1089 5 40 2013-01-01T10:00:00Z \n",
"3 183.0 1576 5 45 2013-01-01T10:00:00Z \n",
"4 116.0 762 6 0 2013-01-01T11:00:00Z \n",
"... ... ... ... ... ... \n",
"336771 NaN 213 14 55 2013-09-30T18:00:00Z \n",
"336772 NaN 198 22 0 2013-10-01T02:00:00Z \n",
"336773 NaN 764 12 10 2013-09-30T16:00:00Z \n",
"336774 NaN 419 11 59 2013-09-30T15:00:00Z \n",
"336775 NaN 431 8 40 2013-09-30T12:00:00Z \n",
"\n",
"[336776 rows x 19 columns]"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from nycflights13 import flights as df_flights\n",
"df_flights"
]
},
{
"cell_type": "markdown",
"id": "29a3c6d5",
"metadata": {},
"source": [
"# The DataFrame Object\n",
"\n",
"The variable `df_flights` above is a *DataFrame*; a way of storing data in Python.\n",
"\n",
"*Aside*: The DataFrame class is provided by the [Pandas](https://pandas.pydata.org/docs/index.html) package, but we will use [Grama](https://github.com/zdelrosario/py_grama) to work with DataFrames.\n"
]
},
{
"cell_type": "markdown",
"id": "58b68692",
"metadata": {},
"source": [
"## Head and Tail\n",
"\n",
"Looking at an entire DataFrame is usually overwhelming; it is useful to be able to focus on a small subset of the data. One of our most basic tools is to get the first or last few rows of a DataFrame, which we can do with `DataFrame.head(n)` and `DataFrame.tail(n)`. These functions are called with the following syntax:\n",
"\n",
"```python\n",
"df_flights.head(10)\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "d1151e0b",
"metadata": {},
"source": [
"### __q1__ Get the tail of a DataFrame\n",
"\n",
"Get the last 10 rows of `df_flights`.\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "172f5752",
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" year | \n",
" month | \n",
" day | \n",
" dep_time | \n",
" sched_dep_time | \n",
" dep_delay | \n",
" arr_time | \n",
" sched_arr_time | \n",
" arr_delay | \n",
" carrier | \n",
" flight | \n",
" tailnum | \n",
" origin | \n",
" dest | \n",
" air_time | \n",
" distance | \n",
" hour | \n",
" minute | \n",
" time_hour | \n",
"
\n",
" \n",
" \n",
" \n",
" 336766 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" 2240.0 | \n",
" 2250 | \n",
" -10.0 | \n",
" 2347.0 | \n",
" 7 | \n",
" -20.0 | \n",
" B6 | \n",
" 2002 | \n",
" N281JB | \n",
" JFK | \n",
" BUF | \n",
" 52.0 | \n",
" 301 | \n",
" 22 | \n",
" 50 | \n",
" 2013-10-01T02:00:00Z | \n",
"
\n",
" \n",
" 336767 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" 2241.0 | \n",
" 2246 | \n",
" -5.0 | \n",
" 2345.0 | \n",
" 1 | \n",
" -16.0 | \n",
" B6 | \n",
" 486 | \n",
" N346JB | \n",
" JFK | \n",
" ROC | \n",
" 47.0 | \n",
" 264 | \n",
" 22 | \n",
" 46 | \n",
" 2013-10-01T02:00:00Z | \n",
"
\n",
" \n",
" 336768 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" 2307.0 | \n",
" 2255 | \n",
" 12.0 | \n",
" 2359.0 | \n",
" 2358 | \n",
" 1.0 | \n",
" B6 | \n",
" 718 | \n",
" N565JB | \n",
" JFK | \n",
" BOS | \n",
" 33.0 | \n",
" 187 | \n",
" 22 | \n",
" 55 | \n",
" 2013-10-01T02:00:00Z | \n",
"
\n",
" \n",
" 336769 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" 2349.0 | \n",
" 2359 | \n",
" -10.0 | \n",
" 325.0 | \n",
" 350 | \n",
" -25.0 | \n",
" B6 | \n",
" 745 | \n",
" N516JB | \n",
" JFK | \n",
" PSE | \n",
" 196.0 | \n",
" 1617 | \n",
" 23 | \n",
" 59 | \n",
" 2013-10-01T03:00:00Z | \n",
"
\n",
" \n",
" 336770 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" NaN | \n",
" 1842 | \n",
" NaN | \n",
" NaN | \n",
" 2019 | \n",
" NaN | \n",
" EV | \n",
" 5274 | \n",
" N740EV | \n",
" LGA | \n",
" BNA | \n",
" NaN | \n",
" 764 | \n",
" 18 | \n",
" 42 | \n",
" 2013-09-30T22:00:00Z | \n",
"
\n",
" \n",
" 336771 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" NaN | \n",
" 1455 | \n",
" NaN | \n",
" NaN | \n",
" 1634 | \n",
" NaN | \n",
" 9E | \n",
" 3393 | \n",
" NaN | \n",
" JFK | \n",
" DCA | \n",
" NaN | \n",
" 213 | \n",
" 14 | \n",
" 55 | \n",
" 2013-09-30T18:00:00Z | \n",
"
\n",
" \n",
" 336772 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" NaN | \n",
" 2200 | \n",
" NaN | \n",
" NaN | \n",
" 2312 | \n",
" NaN | \n",
" 9E | \n",
" 3525 | \n",
" NaN | \n",
" LGA | \n",
" SYR | \n",
" NaN | \n",
" 198 | \n",
" 22 | \n",
" 0 | \n",
" 2013-10-01T02:00:00Z | \n",
"
\n",
" \n",
" 336773 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" NaN | \n",
" 1210 | \n",
" NaN | \n",
" NaN | \n",
" 1330 | \n",
" NaN | \n",
" MQ | \n",
" 3461 | \n",
" N535MQ | \n",
" LGA | \n",
" BNA | \n",
" NaN | \n",
" 764 | \n",
" 12 | \n",
" 10 | \n",
" 2013-09-30T16:00:00Z | \n",
"
\n",
" \n",
" 336774 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" NaN | \n",
" 1159 | \n",
" NaN | \n",
" NaN | \n",
" 1344 | \n",
" NaN | \n",
" MQ | \n",
" 3572 | \n",
" N511MQ | \n",
" LGA | \n",
" CLE | \n",
" NaN | \n",
" 419 | \n",
" 11 | \n",
" 59 | \n",
" 2013-09-30T15:00:00Z | \n",
"
\n",
" \n",
" 336775 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" NaN | \n",
" 840 | \n",
" NaN | \n",
" NaN | \n",
" 1020 | \n",
" NaN | \n",
" MQ | \n",
" 3531 | \n",
" N839MQ | \n",
" LGA | \n",
" RDU | \n",
" NaN | \n",
" 431 | \n",
" 8 | \n",
" 40 | \n",
" 2013-09-30T12:00:00Z | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" year month day dep_time sched_dep_time dep_delay arr_time \\\n",
"336766 2013 9 30 2240.0 2250 -10.0 2347.0 \n",
"336767 2013 9 30 2241.0 2246 -5.0 2345.0 \n",
"336768 2013 9 30 2307.0 2255 12.0 2359.0 \n",
"336769 2013 9 30 2349.0 2359 -10.0 325.0 \n",
"336770 2013 9 30 NaN 1842 NaN NaN \n",
"336771 2013 9 30 NaN 1455 NaN NaN \n",
"336772 2013 9 30 NaN 2200 NaN NaN \n",
"336773 2013 9 30 NaN 1210 NaN NaN \n",
"336774 2013 9 30 NaN 1159 NaN NaN \n",
"336775 2013 9 30 NaN 840 NaN NaN \n",
"\n",
" sched_arr_time arr_delay carrier flight tailnum origin dest \\\n",
"336766 7 -20.0 B6 2002 N281JB JFK BUF \n",
"336767 1 -16.0 B6 486 N346JB JFK ROC \n",
"336768 2358 1.0 B6 718 N565JB JFK BOS \n",
"336769 350 -25.0 B6 745 N516JB JFK PSE \n",
"336770 2019 NaN EV 5274 N740EV LGA BNA \n",
"336771 1634 NaN 9E 3393 NaN JFK DCA \n",
"336772 2312 NaN 9E 3525 NaN LGA SYR \n",
"336773 1330 NaN MQ 3461 N535MQ LGA BNA \n",
"336774 1344 NaN MQ 3572 N511MQ LGA CLE \n",
"336775 1020 NaN MQ 3531 N839MQ LGA RDU \n",
"\n",
" air_time distance hour minute time_hour \n",
"336766 52.0 301 22 50 2013-10-01T02:00:00Z \n",
"336767 47.0 264 22 46 2013-10-01T02:00:00Z \n",
"336768 33.0 187 22 55 2013-10-01T02:00:00Z \n",
"336769 196.0 1617 23 59 2013-10-01T03:00:00Z \n",
"336770 NaN 764 18 42 2013-09-30T22:00:00Z \n",
"336771 NaN 213 14 55 2013-09-30T18:00:00Z \n",
"336772 NaN 198 22 0 2013-10-01T02:00:00Z \n",
"336773 NaN 764 12 10 2013-09-30T16:00:00Z \n",
"336774 NaN 419 11 59 2013-09-30T15:00:00Z \n",
"336775 NaN 431 8 40 2013-09-30T12:00:00Z "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# TASK: Get the last 10 rows of df_flights\n",
"\n",
"df_flights.tail(10)\n",
"# solution-end"
]
},
{
"cell_type": "markdown",
"id": "80d83804",
"metadata": {},
"source": [
"# Piping\n",
"\n",
"Often, when carrying out studies with data, we want to perform *multiple* operations on the same dataset. We could do this by assigning *intermediate variables*, such as a temporary DataFrame:\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "712a954b",
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" year | \n",
" month | \n",
" day | \n",
" dep_time | \n",
" sched_dep_time | \n",
" dep_delay | \n",
" arr_time | \n",
" sched_arr_time | \n",
" arr_delay | \n",
" carrier | \n",
" flight | \n",
" tailnum | \n",
" origin | \n",
" dest | \n",
" air_time | \n",
" distance | \n",
" hour | \n",
" minute | \n",
" time_hour | \n",
"
\n",
" \n",
" \n",
" \n",
" 5 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 554.0 | \n",
" 558 | \n",
" -4.0 | \n",
" 740.0 | \n",
" 728 | \n",
" 12.0 | \n",
" UA | \n",
" 1696 | \n",
" N39463 | \n",
" EWR | \n",
" ORD | \n",
" 150.0 | \n",
" 719 | \n",
" 5 | \n",
" 58 | \n",
" 2013-01-01T10:00:00Z | \n",
"
\n",
" \n",
" 6 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 555.0 | \n",
" 600 | \n",
" -5.0 | \n",
" 913.0 | \n",
" 854 | \n",
" 19.0 | \n",
" B6 | \n",
" 507 | \n",
" N516JB | \n",
" EWR | \n",
" FLL | \n",
" 158.0 | \n",
" 1065 | \n",
" 6 | \n",
" 0 | \n",
" 2013-01-01T11:00:00Z | \n",
"
\n",
" \n",
" 7 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 557.0 | \n",
" 600 | \n",
" -3.0 | \n",
" 709.0 | \n",
" 723 | \n",
" -14.0 | \n",
" EV | \n",
" 5708 | \n",
" N829AS | \n",
" LGA | \n",
" IAD | \n",
" 53.0 | \n",
" 229 | \n",
" 6 | \n",
" 0 | \n",
" 2013-01-01T11:00:00Z | \n",
"
\n",
" \n",
" 8 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 557.0 | \n",
" 600 | \n",
" -3.0 | \n",
" 838.0 | \n",
" 846 | \n",
" -8.0 | \n",
" B6 | \n",
" 79 | \n",
" N593JB | \n",
" JFK | \n",
" MCO | \n",
" 140.0 | \n",
" 944 | \n",
" 6 | \n",
" 0 | \n",
" 2013-01-01T11:00:00Z | \n",
"
\n",
" \n",
" 9 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 558.0 | \n",
" 600 | \n",
" -2.0 | \n",
" 753.0 | \n",
" 745 | \n",
" 8.0 | \n",
" AA | \n",
" 301 | \n",
" N3ALAA | \n",
" LGA | \n",
" ORD | \n",
" 138.0 | \n",
" 733 | \n",
" 6 | \n",
" 0 | \n",
" 2013-01-01T11:00:00Z | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" year month day dep_time sched_dep_time dep_delay arr_time \\\n",
"5 2013 1 1 554.0 558 -4.0 740.0 \n",
"6 2013 1 1 555.0 600 -5.0 913.0 \n",
"7 2013 1 1 557.0 600 -3.0 709.0 \n",
"8 2013 1 1 557.0 600 -3.0 838.0 \n",
"9 2013 1 1 558.0 600 -2.0 753.0 \n",
"\n",
" sched_arr_time arr_delay carrier flight tailnum origin dest air_time \\\n",
"5 728 12.0 UA 1696 N39463 EWR ORD 150.0 \n",
"6 854 19.0 B6 507 N516JB EWR FLL 158.0 \n",
"7 723 -14.0 EV 5708 N829AS LGA IAD 53.0 \n",
"8 846 -8.0 B6 79 N593JB JFK MCO 140.0 \n",
"9 745 8.0 AA 301 N3ALAA LGA ORD 138.0 \n",
"\n",
" distance hour minute time_hour \n",
"5 719 5 58 2013-01-01T10:00:00Z \n",
"6 1065 6 0 2013-01-01T11:00:00Z \n",
"7 229 6 0 2013-01-01T11:00:00Z \n",
"8 944 6 0 2013-01-01T11:00:00Z \n",
"9 733 6 0 2013-01-01T11:00:00Z "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# NOTE: No need to edit\n",
"df_tmp = df_flights.head(10) # Temporary DataFrame\n",
"df_tmp.tail(5)"
]
},
{
"cell_type": "markdown",
"id": "3dc5d2b5",
"metadata": {},
"source": [
"Alternatively, we could *chain* together calls:\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "35adc3f6",
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" year | \n",
" month | \n",
" day | \n",
" dep_time | \n",
" sched_dep_time | \n",
" dep_delay | \n",
" arr_time | \n",
" sched_arr_time | \n",
" arr_delay | \n",
" carrier | \n",
" flight | \n",
" tailnum | \n",
" origin | \n",
" dest | \n",
" air_time | \n",
" distance | \n",
" hour | \n",
" minute | \n",
" time_hour | \n",
"
\n",
" \n",
" \n",
" \n",
" 5 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 554.0 | \n",
" 558 | \n",
" -4.0 | \n",
" 740.0 | \n",
" 728 | \n",
" 12.0 | \n",
" UA | \n",
" 1696 | \n",
" N39463 | \n",
" EWR | \n",
" ORD | \n",
" 150.0 | \n",
" 719 | \n",
" 5 | \n",
" 58 | \n",
" 2013-01-01T10:00:00Z | \n",
"
\n",
" \n",
" 6 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 555.0 | \n",
" 600 | \n",
" -5.0 | \n",
" 913.0 | \n",
" 854 | \n",
" 19.0 | \n",
" B6 | \n",
" 507 | \n",
" N516JB | \n",
" EWR | \n",
" FLL | \n",
" 158.0 | \n",
" 1065 | \n",
" 6 | \n",
" 0 | \n",
" 2013-01-01T11:00:00Z | \n",
"
\n",
" \n",
" 7 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 557.0 | \n",
" 600 | \n",
" -3.0 | \n",
" 709.0 | \n",
" 723 | \n",
" -14.0 | \n",
" EV | \n",
" 5708 | \n",
" N829AS | \n",
" LGA | \n",
" IAD | \n",
" 53.0 | \n",
" 229 | \n",
" 6 | \n",
" 0 | \n",
" 2013-01-01T11:00:00Z | \n",
"
\n",
" \n",
" 8 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 557.0 | \n",
" 600 | \n",
" -3.0 | \n",
" 838.0 | \n",
" 846 | \n",
" -8.0 | \n",
" B6 | \n",
" 79 | \n",
" N593JB | \n",
" JFK | \n",
" MCO | \n",
" 140.0 | \n",
" 944 | \n",
" 6 | \n",
" 0 | \n",
" 2013-01-01T11:00:00Z | \n",
"
\n",
" \n",
" 9 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 558.0 | \n",
" 600 | \n",
" -2.0 | \n",
" 753.0 | \n",
" 745 | \n",
" 8.0 | \n",
" AA | \n",
" 301 | \n",
" N3ALAA | \n",
" LGA | \n",
" ORD | \n",
" 138.0 | \n",
" 733 | \n",
" 6 | \n",
" 0 | \n",
" 2013-01-01T11:00:00Z | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" year month day dep_time sched_dep_time dep_delay arr_time \\\n",
"5 2013 1 1 554.0 558 -4.0 740.0 \n",
"6 2013 1 1 555.0 600 -5.0 913.0 \n",
"7 2013 1 1 557.0 600 -3.0 709.0 \n",
"8 2013 1 1 557.0 600 -3.0 838.0 \n",
"9 2013 1 1 558.0 600 -2.0 753.0 \n",
"\n",
" sched_arr_time arr_delay carrier flight tailnum origin dest air_time \\\n",
"5 728 12.0 UA 1696 N39463 EWR ORD 150.0 \n",
"6 854 19.0 B6 507 N516JB EWR FLL 158.0 \n",
"7 723 -14.0 EV 5708 N829AS LGA IAD 53.0 \n",
"8 846 -8.0 B6 79 N593JB JFK MCO 140.0 \n",
"9 745 8.0 AA 301 N3ALAA LGA ORD 138.0 \n",
"\n",
" distance hour minute time_hour \n",
"5 719 5 58 2013-01-01T10:00:00Z \n",
"6 1065 6 0 2013-01-01T11:00:00Z \n",
"7 229 6 0 2013-01-01T11:00:00Z \n",
"8 944 6 0 2013-01-01T11:00:00Z \n",
"9 733 6 0 2013-01-01T11:00:00Z "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# NOTE: No need to edit\n",
"df_flights.head(10).tail(5)"
]
},
{
"cell_type": "markdown",
"id": "83adabd2",
"metadata": {},
"source": [
"Rather than these two approaches, using [grama](https://github.com/zdelrosario/py_grama) we can form a *data pipeline* using the pipe operator `>>`. The following code demonstrates the use of the pipe operator:\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "0e9039bd",
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" year | \n",
" month | \n",
" day | \n",
" dep_time | \n",
" sched_dep_time | \n",
" dep_delay | \n",
" arr_time | \n",
" sched_arr_time | \n",
" arr_delay | \n",
" carrier | \n",
" flight | \n",
" tailnum | \n",
" origin | \n",
" dest | \n",
" air_time | \n",
" distance | \n",
" hour | \n",
" minute | \n",
" time_hour | \n",
"
\n",
" \n",
" \n",
" \n",
" 5 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 554.0 | \n",
" 558 | \n",
" -4.0 | \n",
" 740.0 | \n",
" 728 | \n",
" 12.0 | \n",
" UA | \n",
" 1696 | \n",
" N39463 | \n",
" EWR | \n",
" ORD | \n",
" 150.0 | \n",
" 719 | \n",
" 5 | \n",
" 58 | \n",
" 2013-01-01T10:00:00Z | \n",
"
\n",
" \n",
" 6 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 555.0 | \n",
" 600 | \n",
" -5.0 | \n",
" 913.0 | \n",
" 854 | \n",
" 19.0 | \n",
" B6 | \n",
" 507 | \n",
" N516JB | \n",
" EWR | \n",
" FLL | \n",
" 158.0 | \n",
" 1065 | \n",
" 6 | \n",
" 0 | \n",
" 2013-01-01T11:00:00Z | \n",
"
\n",
" \n",
" 7 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 557.0 | \n",
" 600 | \n",
" -3.0 | \n",
" 709.0 | \n",
" 723 | \n",
" -14.0 | \n",
" EV | \n",
" 5708 | \n",
" N829AS | \n",
" LGA | \n",
" IAD | \n",
" 53.0 | \n",
" 229 | \n",
" 6 | \n",
" 0 | \n",
" 2013-01-01T11:00:00Z | \n",
"
\n",
" \n",
" 8 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 557.0 | \n",
" 600 | \n",
" -3.0 | \n",
" 838.0 | \n",
" 846 | \n",
" -8.0 | \n",
" B6 | \n",
" 79 | \n",
" N593JB | \n",
" JFK | \n",
" MCO | \n",
" 140.0 | \n",
" 944 | \n",
" 6 | \n",
" 0 | \n",
" 2013-01-01T11:00:00Z | \n",
"
\n",
" \n",
" 9 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 558.0 | \n",
" 600 | \n",
" -2.0 | \n",
" 753.0 | \n",
" 745 | \n",
" 8.0 | \n",
" AA | \n",
" 301 | \n",
" N3ALAA | \n",
" LGA | \n",
" ORD | \n",
" 138.0 | \n",
" 733 | \n",
" 6 | \n",
" 0 | \n",
" 2013-01-01T11:00:00Z | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" year month day dep_time sched_dep_time dep_delay arr_time \\\n",
"5 2013 1 1 554.0 558 -4.0 740.0 \n",
"6 2013 1 1 555.0 600 -5.0 913.0 \n",
"7 2013 1 1 557.0 600 -3.0 709.0 \n",
"8 2013 1 1 557.0 600 -3.0 838.0 \n",
"9 2013 1 1 558.0 600 -2.0 753.0 \n",
"\n",
" sched_arr_time arr_delay carrier flight tailnum origin dest air_time \\\n",
"5 728 12.0 UA 1696 N39463 EWR ORD 150.0 \n",
"6 854 19.0 B6 507 N516JB EWR FLL 158.0 \n",
"7 723 -14.0 EV 5708 N829AS LGA IAD 53.0 \n",
"8 846 -8.0 B6 79 N593JB JFK MCO 140.0 \n",
"9 745 8.0 AA 301 N3ALAA LGA ORD 138.0 \n",
"\n",
" distance hour minute time_hour \n",
"5 719 5 58 2013-01-01T10:00:00Z \n",
"6 1065 6 0 2013-01-01T11:00:00Z \n",
"7 229 6 0 2013-01-01T11:00:00Z \n",
"8 944 6 0 2013-01-01T11:00:00Z \n",
"9 733 6 0 2013-01-01T11:00:00Z "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# NOTE: No need to edit\n",
"(\n",
" df_flights\n",
" >> gr.tf_head(10)\n",
" >> gr.tf_tail(5)\n",
")"
]
},
{
"cell_type": "markdown",
"id": "24db207d",
"metadata": {},
"source": [
"It's useful to think of the pipe operator as the English word \"then\". This allows us to translate code:\n",
"\n",
"```python\n",
"(\n",
" df_flights\n",
" >> gr.tf_head(10)\n",
" >> gr.tf_tail(5)\n",
")\n",
"```\n",
"\n",
"into something looking like a plain-language sentence\n",
"\n",
"```\n",
"(\n",
" Start with df_flights\n",
" \"then\" take the first 10\n",
" \"then\" take the last 5.\n",
")\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "dc64bd07",
"metadata": {},
"source": [
"*Aside*: Most of the grama functions we'll use in this exercise start with the `tf_` prefix; this means they take in a DataFrame and return a DataFrame (they `t`rans`f`orm data).\n"
]
},
{
"cell_type": "markdown",
"id": "cab9a711",
"metadata": {},
"source": [
"### __q2__ Convert to piped code\n",
"\n",
"Convert the following code to a pipe-enabled version.\n",
"\n",
"*Note*: The pipe-enabled version of `DataFrame.head()` is `tf_head()`.\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "652009bc",
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" year | \n",
" month | \n",
" day | \n",
" dep_time | \n",
" sched_dep_time | \n",
" dep_delay | \n",
" arr_time | \n",
" sched_arr_time | \n",
" arr_delay | \n",
" carrier | \n",
" flight | \n",
" tailnum | \n",
" origin | \n",
" dest | \n",
" air_time | \n",
" distance | \n",
" hour | \n",
" minute | \n",
" time_hour | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 517.0 | \n",
" 515 | \n",
" 2.0 | \n",
" 830.0 | \n",
" 819 | \n",
" 11.0 | \n",
" UA | \n",
" 1545 | \n",
" N14228 | \n",
" EWR | \n",
" IAH | \n",
" 227.0 | \n",
" 1400 | \n",
" 5 | \n",
" 15 | \n",
" 2013-01-01T10:00:00Z | \n",
"
\n",
" \n",
" 1 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 533.0 | \n",
" 529 | \n",
" 4.0 | \n",
" 850.0 | \n",
" 830 | \n",
" 20.0 | \n",
" UA | \n",
" 1714 | \n",
" N24211 | \n",
" LGA | \n",
" IAH | \n",
" 227.0 | \n",
" 1416 | \n",
" 5 | \n",
" 29 | \n",
" 2013-01-01T10:00:00Z | \n",
"
\n",
" \n",
" 2 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 542.0 | \n",
" 540 | \n",
" 2.0 | \n",
" 923.0 | \n",
" 850 | \n",
" 33.0 | \n",
" AA | \n",
" 1141 | \n",
" N619AA | \n",
" JFK | \n",
" MIA | \n",
" 160.0 | \n",
" 1089 | \n",
" 5 | \n",
" 40 | \n",
" 2013-01-01T10:00:00Z | \n",
"
\n",
" \n",
" 3 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 544.0 | \n",
" 545 | \n",
" -1.0 | \n",
" 1004.0 | \n",
" 1022 | \n",
" -18.0 | \n",
" B6 | \n",
" 725 | \n",
" N804JB | \n",
" JFK | \n",
" BQN | \n",
" 183.0 | \n",
" 1576 | \n",
" 5 | \n",
" 45 | \n",
" 2013-01-01T10:00:00Z | \n",
"
\n",
" \n",
" 4 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 554.0 | \n",
" 600 | \n",
" -6.0 | \n",
" 812.0 | \n",
" 837 | \n",
" -25.0 | \n",
" DL | \n",
" 461 | \n",
" N668DN | \n",
" LGA | \n",
" ATL | \n",
" 116.0 | \n",
" 762 | \n",
" 6 | \n",
" 0 | \n",
" 2013-01-01T11:00:00Z | \n",
"
\n",
" \n",
" 5 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 554.0 | \n",
" 558 | \n",
" -4.0 | \n",
" 740.0 | \n",
" 728 | \n",
" 12.0 | \n",
" UA | \n",
" 1696 | \n",
" N39463 | \n",
" EWR | \n",
" ORD | \n",
" 150.0 | \n",
" 719 | \n",
" 5 | \n",
" 58 | \n",
" 2013-01-01T10:00:00Z | \n",
"
\n",
" \n",
" 6 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 555.0 | \n",
" 600 | \n",
" -5.0 | \n",
" 913.0 | \n",
" 854 | \n",
" 19.0 | \n",
" B6 | \n",
" 507 | \n",
" N516JB | \n",
" EWR | \n",
" FLL | \n",
" 158.0 | \n",
" 1065 | \n",
" 6 | \n",
" 0 | \n",
" 2013-01-01T11:00:00Z | \n",
"
\n",
" \n",
" 7 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 557.0 | \n",
" 600 | \n",
" -3.0 | \n",
" 709.0 | \n",
" 723 | \n",
" -14.0 | \n",
" EV | \n",
" 5708 | \n",
" N829AS | \n",
" LGA | \n",
" IAD | \n",
" 53.0 | \n",
" 229 | \n",
" 6 | \n",
" 0 | \n",
" 2013-01-01T11:00:00Z | \n",
"
\n",
" \n",
" 8 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 557.0 | \n",
" 600 | \n",
" -3.0 | \n",
" 838.0 | \n",
" 846 | \n",
" -8.0 | \n",
" B6 | \n",
" 79 | \n",
" N593JB | \n",
" JFK | \n",
" MCO | \n",
" 140.0 | \n",
" 944 | \n",
" 6 | \n",
" 0 | \n",
" 2013-01-01T11:00:00Z | \n",
"
\n",
" \n",
" 9 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 558.0 | \n",
" 600 | \n",
" -2.0 | \n",
" 753.0 | \n",
" 745 | \n",
" 8.0 | \n",
" AA | \n",
" 301 | \n",
" N3ALAA | \n",
" LGA | \n",
" ORD | \n",
" 138.0 | \n",
" 733 | \n",
" 6 | \n",
" 0 | \n",
" 2013-01-01T11:00:00Z | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" year month day dep_time sched_dep_time dep_delay arr_time \\\n",
"0 2013 1 1 517.0 515 2.0 830.0 \n",
"1 2013 1 1 533.0 529 4.0 850.0 \n",
"2 2013 1 1 542.0 540 2.0 923.0 \n",
"3 2013 1 1 544.0 545 -1.0 1004.0 \n",
"4 2013 1 1 554.0 600 -6.0 812.0 \n",
"5 2013 1 1 554.0 558 -4.0 740.0 \n",
"6 2013 1 1 555.0 600 -5.0 913.0 \n",
"7 2013 1 1 557.0 600 -3.0 709.0 \n",
"8 2013 1 1 557.0 600 -3.0 838.0 \n",
"9 2013 1 1 558.0 600 -2.0 753.0 \n",
"\n",
" sched_arr_time arr_delay carrier flight tailnum origin dest air_time \\\n",
"0 819 11.0 UA 1545 N14228 EWR IAH 227.0 \n",
"1 830 20.0 UA 1714 N24211 LGA IAH 227.0 \n",
"2 850 33.0 AA 1141 N619AA JFK MIA 160.0 \n",
"3 1022 -18.0 B6 725 N804JB JFK BQN 183.0 \n",
"4 837 -25.0 DL 461 N668DN LGA ATL 116.0 \n",
"5 728 12.0 UA 1696 N39463 EWR ORD 150.0 \n",
"6 854 19.0 B6 507 N516JB EWR FLL 158.0 \n",
"7 723 -14.0 EV 5708 N829AS LGA IAD 53.0 \n",
"8 846 -8.0 B6 79 N593JB JFK MCO 140.0 \n",
"9 745 8.0 AA 301 N3ALAA LGA ORD 138.0 \n",
"\n",
" distance hour minute time_hour \n",
"0 1400 5 15 2013-01-01T10:00:00Z \n",
"1 1416 5 29 2013-01-01T10:00:00Z \n",
"2 1089 5 40 2013-01-01T10:00:00Z \n",
"3 1576 5 45 2013-01-01T10:00:00Z \n",
"4 762 6 0 2013-01-01T11:00:00Z \n",
"5 719 5 58 2013-01-01T10:00:00Z \n",
"6 1065 6 0 2013-01-01T11:00:00Z \n",
"7 229 6 0 2013-01-01T11:00:00Z \n",
"8 944 6 0 2013-01-01T11:00:00Z \n",
"9 733 6 0 2013-01-01T11:00:00Z "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# TASK: Convert the following code to pipe-enabled form\n",
"df_flights.head(10)\n",
"\n",
"(\n",
" df_flights\n",
"\n",
" >> gr.tf_head(10)\n",
")"
]
},
{
"cell_type": "markdown",
"id": "be251865",
"metadata": {},
"source": [
"# Column Selection\n",
"\n",
"The grama function `gr.tf_select()` allows us to select particular columns, which is helpful for getting a more \"focused\" view of a dataset.\n"
]
},
{
"cell_type": "markdown",
"id": "cdde52ba",
"metadata": {},
"source": [
"### __q3__ Select particular columns\n",
"\n",
"Select the columns \"month\", \"day\", \"origin\", and \"dest\".\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "4b013e4b",
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" month | \n",
" day | \n",
" origin | \n",
" dest | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" EWR | \n",
" IAH | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" LGA | \n",
" IAH | \n",
"
\n",
" \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
" JFK | \n",
" MIA | \n",
"
\n",
" \n",
" 3 | \n",
" 1 | \n",
" 1 | \n",
" JFK | \n",
" BQN | \n",
"
\n",
" \n",
" 4 | \n",
" 1 | \n",
" 1 | \n",
" LGA | \n",
" ATL | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 336771 | \n",
" 9 | \n",
" 30 | \n",
" JFK | \n",
" DCA | \n",
"
\n",
" \n",
" 336772 | \n",
" 9 | \n",
" 30 | \n",
" LGA | \n",
" SYR | \n",
"
\n",
" \n",
" 336773 | \n",
" 9 | \n",
" 30 | \n",
" LGA | \n",
" BNA | \n",
"
\n",
" \n",
" 336774 | \n",
" 9 | \n",
" 30 | \n",
" LGA | \n",
" CLE | \n",
"
\n",
" \n",
" 336775 | \n",
" 9 | \n",
" 30 | \n",
" LGA | \n",
" RDU | \n",
"
\n",
" \n",
"
\n",
"
336776 rows × 4 columns
\n",
"
"
],
"text/plain": [
" month day origin dest\n",
"0 1 1 EWR IAH\n",
"1 1 1 LGA IAH\n",
"2 1 1 JFK MIA\n",
"3 1 1 JFK BQN\n",
"4 1 1 LGA ATL\n",
"... ... ... ... ...\n",
"336771 9 30 JFK DCA\n",
"336772 9 30 LGA SYR\n",
"336773 9 30 LGA BNA\n",
"336774 9 30 LGA CLE\n",
"336775 9 30 LGA RDU\n",
"\n",
"[336776 rows x 4 columns]"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"## TASK: Select the columns \"month\", \"day\", \"origin\", and \"dest\"\n",
"(\n",
" df_flights\n",
" >> gr.tf_select(\"month\", \"day\", \"origin\", \"dest\")\n",
")"
]
},
{
"cell_type": "markdown",
"id": "f64091a9",
"metadata": {},
"source": [
"## Selection Helpers\n",
"\n",
"The `gr.tf_select()` function is helpful, but it is made *extremely powerful* with a few selection helpers. Rather than specify specific column names, we can use helpers to *match* names that satisfy different criteria.\n",
"\n",
"Here are a few of the most important selection helpers: \n",
"\n",
"| Helper | Selects |\n",
"|---|---|\n",
"| `gr.starts_with(s)` | Columns that start with string `s` |\n",
"| `gr.ends_with(s)` | Columns that end with string `s` |\n",
"| `gr.contains(s)` | Columns that contain the string `s` |\n",
"| `gr.everything()` | All columns *not already selected* |\n",
"\n",
"For instance, the following code will select all the columns whose name starts with `\"dep_\"`.\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "dabb0c81",
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" dep_time | \n",
" dep_delay | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 517.0 | \n",
" 2.0 | \n",
"
\n",
" \n",
" 1 | \n",
" 533.0 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 2 | \n",
" 542.0 | \n",
" 2.0 | \n",
"
\n",
" \n",
" 3 | \n",
" 544.0 | \n",
" -1.0 | \n",
"
\n",
" \n",
" 4 | \n",
" 554.0 | \n",
" -6.0 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 336771 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 336772 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 336773 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 336774 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 336775 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
336776 rows × 2 columns
\n",
"
"
],
"text/plain": [
" dep_time dep_delay\n",
"0 517.0 2.0\n",
"1 533.0 4.0\n",
"2 542.0 2.0\n",
"3 544.0 -1.0\n",
"4 554.0 -6.0\n",
"... ... ...\n",
"336771 NaN NaN\n",
"336772 NaN NaN\n",
"336773 NaN NaN\n",
"336774 NaN NaN\n",
"336775 NaN NaN\n",
"\n",
"[336776 rows x 2 columns]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# NOTE: No need to edit\n",
"(\n",
" df_flights\n",
" >> gr.tf_select(gr.starts_with(\"dep_\"))\n",
")"
]
},
{
"cell_type": "markdown",
"id": "fa8e497c",
"metadata": {},
"source": [
"### __q4__ Match columns\n",
"\n",
"Select only those columns whose name ends with `\"_time\"`.\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "ff0ca8f5",
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" dep_time | \n",
" sched_dep_time | \n",
" arr_time | \n",
" sched_arr_time | \n",
" air_time | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 517.0 | \n",
" 515 | \n",
" 830.0 | \n",
" 819 | \n",
" 227.0 | \n",
"
\n",
" \n",
" 1 | \n",
" 533.0 | \n",
" 529 | \n",
" 850.0 | \n",
" 830 | \n",
" 227.0 | \n",
"
\n",
" \n",
" 2 | \n",
" 542.0 | \n",
" 540 | \n",
" 923.0 | \n",
" 850 | \n",
" 160.0 | \n",
"
\n",
" \n",
" 3 | \n",
" 544.0 | \n",
" 545 | \n",
" 1004.0 | \n",
" 1022 | \n",
" 183.0 | \n",
"
\n",
" \n",
" 4 | \n",
" 554.0 | \n",
" 600 | \n",
" 812.0 | \n",
" 837 | \n",
" 116.0 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 336771 | \n",
" NaN | \n",
" 1455 | \n",
" NaN | \n",
" 1634 | \n",
" NaN | \n",
"
\n",
" \n",
" 336772 | \n",
" NaN | \n",
" 2200 | \n",
" NaN | \n",
" 2312 | \n",
" NaN | \n",
"
\n",
" \n",
" 336773 | \n",
" NaN | \n",
" 1210 | \n",
" NaN | \n",
" 1330 | \n",
" NaN | \n",
"
\n",
" \n",
" 336774 | \n",
" NaN | \n",
" 1159 | \n",
" NaN | \n",
" 1344 | \n",
" NaN | \n",
"
\n",
" \n",
" 336775 | \n",
" NaN | \n",
" 840 | \n",
" NaN | \n",
" 1020 | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
336776 rows × 5 columns
\n",
"
"
],
"text/plain": [
" dep_time sched_dep_time arr_time sched_arr_time air_time\n",
"0 517.0 515 830.0 819 227.0\n",
"1 533.0 529 850.0 830 227.0\n",
"2 542.0 540 923.0 850 160.0\n",
"3 544.0 545 1004.0 1022 183.0\n",
"4 554.0 600 812.0 837 116.0\n",
"... ... ... ... ... ...\n",
"336771 NaN 1455 NaN 1634 NaN\n",
"336772 NaN 2200 NaN 2312 NaN\n",
"336773 NaN 1210 NaN 1330 NaN\n",
"336774 NaN 1159 NaN 1344 NaN\n",
"336775 NaN 840 NaN 1020 NaN\n",
"\n",
"[336776 rows x 5 columns]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"## TASK: Select only those columns whose name ends with \"_time\"\n",
"(\n",
" df_flights\n",
" >> gr.tf_select(gr.ends_with(\"_time\"))\n",
")"
]
},
{
"cell_type": "markdown",
"id": "e5e84899",
"metadata": {},
"source": [
"## Re-arranging columns with `gr.everything()`\n",
"\n",
"The `gr.everything()` helper may seem silly, but it is actually *extremely* useful; since the everything helper selects all columns *not already selected*, we can use it to re-arrange the columns in a DataFrame for a more convenient view. For instance, we can bring a few columns closer together to aid in column comparisons, as the following code demonstrates:\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "7e629c16",
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" origin | \n",
" dest | \n",
" distance | \n",
" year | \n",
" month | \n",
" day | \n",
" dep_time | \n",
" sched_dep_time | \n",
" dep_delay | \n",
" arr_time | \n",
" sched_arr_time | \n",
" arr_delay | \n",
" carrier | \n",
" flight | \n",
" tailnum | \n",
" air_time | \n",
" hour | \n",
" minute | \n",
" time_hour | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" EWR | \n",
" IAH | \n",
" 1400 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 517.0 | \n",
" 515 | \n",
" 2.0 | \n",
" 830.0 | \n",
" 819 | \n",
" 11.0 | \n",
" UA | \n",
" 1545 | \n",
" N14228 | \n",
" 227.0 | \n",
" 5 | \n",
" 15 | \n",
" 2013-01-01T10:00:00Z | \n",
"
\n",
" \n",
" 1 | \n",
" LGA | \n",
" IAH | \n",
" 1416 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 533.0 | \n",
" 529 | \n",
" 4.0 | \n",
" 850.0 | \n",
" 830 | \n",
" 20.0 | \n",
" UA | \n",
" 1714 | \n",
" N24211 | \n",
" 227.0 | \n",
" 5 | \n",
" 29 | \n",
" 2013-01-01T10:00:00Z | \n",
"
\n",
" \n",
" 2 | \n",
" JFK | \n",
" MIA | \n",
" 1089 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 542.0 | \n",
" 540 | \n",
" 2.0 | \n",
" 923.0 | \n",
" 850 | \n",
" 33.0 | \n",
" AA | \n",
" 1141 | \n",
" N619AA | \n",
" 160.0 | \n",
" 5 | \n",
" 40 | \n",
" 2013-01-01T10:00:00Z | \n",
"
\n",
" \n",
" 3 | \n",
" JFK | \n",
" BQN | \n",
" 1576 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 544.0 | \n",
" 545 | \n",
" -1.0 | \n",
" 1004.0 | \n",
" 1022 | \n",
" -18.0 | \n",
" B6 | \n",
" 725 | \n",
" N804JB | \n",
" 183.0 | \n",
" 5 | \n",
" 45 | \n",
" 2013-01-01T10:00:00Z | \n",
"
\n",
" \n",
" 4 | \n",
" LGA | \n",
" ATL | \n",
" 762 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 554.0 | \n",
" 600 | \n",
" -6.0 | \n",
" 812.0 | \n",
" 837 | \n",
" -25.0 | \n",
" DL | \n",
" 461 | \n",
" N668DN | \n",
" 116.0 | \n",
" 6 | \n",
" 0 | \n",
" 2013-01-01T11:00:00Z | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 336771 | \n",
" JFK | \n",
" DCA | \n",
" 213 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" NaN | \n",
" 1455 | \n",
" NaN | \n",
" NaN | \n",
" 1634 | \n",
" NaN | \n",
" 9E | \n",
" 3393 | \n",
" NaN | \n",
" NaN | \n",
" 14 | \n",
" 55 | \n",
" 2013-09-30T18:00:00Z | \n",
"
\n",
" \n",
" 336772 | \n",
" LGA | \n",
" SYR | \n",
" 198 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" NaN | \n",
" 2200 | \n",
" NaN | \n",
" NaN | \n",
" 2312 | \n",
" NaN | \n",
" 9E | \n",
" 3525 | \n",
" NaN | \n",
" NaN | \n",
" 22 | \n",
" 0 | \n",
" 2013-10-01T02:00:00Z | \n",
"
\n",
" \n",
" 336773 | \n",
" LGA | \n",
" BNA | \n",
" 764 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" NaN | \n",
" 1210 | \n",
" NaN | \n",
" NaN | \n",
" 1330 | \n",
" NaN | \n",
" MQ | \n",
" 3461 | \n",
" N535MQ | \n",
" NaN | \n",
" 12 | \n",
" 10 | \n",
" 2013-09-30T16:00:00Z | \n",
"
\n",
" \n",
" 336774 | \n",
" LGA | \n",
" CLE | \n",
" 419 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" NaN | \n",
" 1159 | \n",
" NaN | \n",
" NaN | \n",
" 1344 | \n",
" NaN | \n",
" MQ | \n",
" 3572 | \n",
" N511MQ | \n",
" NaN | \n",
" 11 | \n",
" 59 | \n",
" 2013-09-30T15:00:00Z | \n",
"
\n",
" \n",
" 336775 | \n",
" LGA | \n",
" RDU | \n",
" 431 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" NaN | \n",
" 840 | \n",
" NaN | \n",
" NaN | \n",
" 1020 | \n",
" NaN | \n",
" MQ | \n",
" 3531 | \n",
" N839MQ | \n",
" NaN | \n",
" 8 | \n",
" 40 | \n",
" 2013-09-30T12:00:00Z | \n",
"
\n",
" \n",
"
\n",
"
336776 rows × 19 columns
\n",
"
"
],
"text/plain": [
" origin dest distance year month day dep_time sched_dep_time \\\n",
"0 EWR IAH 1400 2013 1 1 517.0 515 \n",
"1 LGA IAH 1416 2013 1 1 533.0 529 \n",
"2 JFK MIA 1089 2013 1 1 542.0 540 \n",
"3 JFK BQN 1576 2013 1 1 544.0 545 \n",
"4 LGA ATL 762 2013 1 1 554.0 600 \n",
"... ... ... ... ... ... ... ... ... \n",
"336771 JFK DCA 213 2013 9 30 NaN 1455 \n",
"336772 LGA SYR 198 2013 9 30 NaN 2200 \n",
"336773 LGA BNA 764 2013 9 30 NaN 1210 \n",
"336774 LGA CLE 419 2013 9 30 NaN 1159 \n",
"336775 LGA RDU 431 2013 9 30 NaN 840 \n",
"\n",
" dep_delay arr_time sched_arr_time arr_delay carrier flight \\\n",
"0 2.0 830.0 819 11.0 UA 1545 \n",
"1 4.0 850.0 830 20.0 UA 1714 \n",
"2 2.0 923.0 850 33.0 AA 1141 \n",
"3 -1.0 1004.0 1022 -18.0 B6 725 \n",
"4 -6.0 812.0 837 -25.0 DL 461 \n",
"... ... ... ... ... ... ... \n",
"336771 NaN NaN 1634 NaN 9E 3393 \n",
"336772 NaN NaN 2312 NaN 9E 3525 \n",
"336773 NaN NaN 1330 NaN MQ 3461 \n",
"336774 NaN NaN 1344 NaN MQ 3572 \n",
"336775 NaN NaN 1020 NaN MQ 3531 \n",
"\n",
" tailnum air_time hour minute time_hour \n",
"0 N14228 227.0 5 15 2013-01-01T10:00:00Z \n",
"1 N24211 227.0 5 29 2013-01-01T10:00:00Z \n",
"2 N619AA 160.0 5 40 2013-01-01T10:00:00Z \n",
"3 N804JB 183.0 5 45 2013-01-01T10:00:00Z \n",
"4 N668DN 116.0 6 0 2013-01-01T11:00:00Z \n",
"... ... ... ... ... ... \n",
"336771 NaN NaN 14 55 2013-09-30T18:00:00Z \n",
"336772 NaN NaN 22 0 2013-10-01T02:00:00Z \n",
"336773 N535MQ NaN 12 10 2013-09-30T16:00:00Z \n",
"336774 N511MQ NaN 11 59 2013-09-30T15:00:00Z \n",
"336775 N839MQ NaN 8 40 2013-09-30T12:00:00Z \n",
"\n",
"[336776 rows x 19 columns]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"## NOTE: No need to edit\n",
"(\n",
" df_flights\n",
" >> gr.tf_select(\"origin\", \"dest\", \"distance\", gr.everything())\n",
")"
]
},
{
"cell_type": "markdown",
"id": "8184a227",
"metadata": {},
"source": [
"### __q5__ Re-arrange columns\n",
"\n",
"Re-arrange the columns to place `dest, origin, carrier` at the left, but retain all other columns.\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "24e25d11",
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" dest | \n",
" origin | \n",
" carrier | \n",
" year | \n",
" month | \n",
" day | \n",
" dep_time | \n",
" sched_dep_time | \n",
" dep_delay | \n",
" arr_time | \n",
" sched_arr_time | \n",
" arr_delay | \n",
" flight | \n",
" tailnum | \n",
" air_time | \n",
" distance | \n",
" hour | \n",
" minute | \n",
" time_hour | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" IAH | \n",
" EWR | \n",
" UA | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 517.0 | \n",
" 515 | \n",
" 2.0 | \n",
" 830.0 | \n",
" 819 | \n",
" 11.0 | \n",
" 1545 | \n",
" N14228 | \n",
" 227.0 | \n",
" 1400 | \n",
" 5 | \n",
" 15 | \n",
" 2013-01-01T10:00:00Z | \n",
"
\n",
" \n",
" 1 | \n",
" IAH | \n",
" LGA | \n",
" UA | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 533.0 | \n",
" 529 | \n",
" 4.0 | \n",
" 850.0 | \n",
" 830 | \n",
" 20.0 | \n",
" 1714 | \n",
" N24211 | \n",
" 227.0 | \n",
" 1416 | \n",
" 5 | \n",
" 29 | \n",
" 2013-01-01T10:00:00Z | \n",
"
\n",
" \n",
" 2 | \n",
" MIA | \n",
" JFK | \n",
" AA | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 542.0 | \n",
" 540 | \n",
" 2.0 | \n",
" 923.0 | \n",
" 850 | \n",
" 33.0 | \n",
" 1141 | \n",
" N619AA | \n",
" 160.0 | \n",
" 1089 | \n",
" 5 | \n",
" 40 | \n",
" 2013-01-01T10:00:00Z | \n",
"
\n",
" \n",
" 3 | \n",
" BQN | \n",
" JFK | \n",
" B6 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 544.0 | \n",
" 545 | \n",
" -1.0 | \n",
" 1004.0 | \n",
" 1022 | \n",
" -18.0 | \n",
" 725 | \n",
" N804JB | \n",
" 183.0 | \n",
" 1576 | \n",
" 5 | \n",
" 45 | \n",
" 2013-01-01T10:00:00Z | \n",
"
\n",
" \n",
" 4 | \n",
" ATL | \n",
" LGA | \n",
" DL | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 554.0 | \n",
" 600 | \n",
" -6.0 | \n",
" 812.0 | \n",
" 837 | \n",
" -25.0 | \n",
" 461 | \n",
" N668DN | \n",
" 116.0 | \n",
" 762 | \n",
" 6 | \n",
" 0 | \n",
" 2013-01-01T11:00:00Z | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 336771 | \n",
" DCA | \n",
" JFK | \n",
" 9E | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" NaN | \n",
" 1455 | \n",
" NaN | \n",
" NaN | \n",
" 1634 | \n",
" NaN | \n",
" 3393 | \n",
" NaN | \n",
" NaN | \n",
" 213 | \n",
" 14 | \n",
" 55 | \n",
" 2013-09-30T18:00:00Z | \n",
"
\n",
" \n",
" 336772 | \n",
" SYR | \n",
" LGA | \n",
" 9E | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" NaN | \n",
" 2200 | \n",
" NaN | \n",
" NaN | \n",
" 2312 | \n",
" NaN | \n",
" 3525 | \n",
" NaN | \n",
" NaN | \n",
" 198 | \n",
" 22 | \n",
" 0 | \n",
" 2013-10-01T02:00:00Z | \n",
"
\n",
" \n",
" 336773 | \n",
" BNA | \n",
" LGA | \n",
" MQ | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" NaN | \n",
" 1210 | \n",
" NaN | \n",
" NaN | \n",
" 1330 | \n",
" NaN | \n",
" 3461 | \n",
" N535MQ | \n",
" NaN | \n",
" 764 | \n",
" 12 | \n",
" 10 | \n",
" 2013-09-30T16:00:00Z | \n",
"
\n",
" \n",
" 336774 | \n",
" CLE | \n",
" LGA | \n",
" MQ | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" NaN | \n",
" 1159 | \n",
" NaN | \n",
" NaN | \n",
" 1344 | \n",
" NaN | \n",
" 3572 | \n",
" N511MQ | \n",
" NaN | \n",
" 419 | \n",
" 11 | \n",
" 59 | \n",
" 2013-09-30T15:00:00Z | \n",
"
\n",
" \n",
" 336775 | \n",
" RDU | \n",
" LGA | \n",
" MQ | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" NaN | \n",
" 840 | \n",
" NaN | \n",
" NaN | \n",
" 1020 | \n",
" NaN | \n",
" 3531 | \n",
" N839MQ | \n",
" NaN | \n",
" 431 | \n",
" 8 | \n",
" 40 | \n",
" 2013-09-30T12:00:00Z | \n",
"
\n",
" \n",
"
\n",
"
336776 rows × 19 columns
\n",
"
"
],
"text/plain": [
" dest origin carrier year month day dep_time sched_dep_time \\\n",
"0 IAH EWR UA 2013 1 1 517.0 515 \n",
"1 IAH LGA UA 2013 1 1 533.0 529 \n",
"2 MIA JFK AA 2013 1 1 542.0 540 \n",
"3 BQN JFK B6 2013 1 1 544.0 545 \n",
"4 ATL LGA DL 2013 1 1 554.0 600 \n",
"... ... ... ... ... ... ... ... ... \n",
"336771 DCA JFK 9E 2013 9 30 NaN 1455 \n",
"336772 SYR LGA 9E 2013 9 30 NaN 2200 \n",
"336773 BNA LGA MQ 2013 9 30 NaN 1210 \n",
"336774 CLE LGA MQ 2013 9 30 NaN 1159 \n",
"336775 RDU LGA MQ 2013 9 30 NaN 840 \n",
"\n",
" dep_delay arr_time sched_arr_time arr_delay flight tailnum \\\n",
"0 2.0 830.0 819 11.0 1545 N14228 \n",
"1 4.0 850.0 830 20.0 1714 N24211 \n",
"2 2.0 923.0 850 33.0 1141 N619AA \n",
"3 -1.0 1004.0 1022 -18.0 725 N804JB \n",
"4 -6.0 812.0 837 -25.0 461 N668DN \n",
"... ... ... ... ... ... ... \n",
"336771 NaN NaN 1634 NaN 3393 NaN \n",
"336772 NaN NaN 2312 NaN 3525 NaN \n",
"336773 NaN NaN 1330 NaN 3461 N535MQ \n",
"336774 NaN NaN 1344 NaN 3572 N511MQ \n",
"336775 NaN NaN 1020 NaN 3531 N839MQ \n",
"\n",
" air_time distance hour minute time_hour \n",
"0 227.0 1400 5 15 2013-01-01T10:00:00Z \n",
"1 227.0 1416 5 29 2013-01-01T10:00:00Z \n",
"2 160.0 1089 5 40 2013-01-01T10:00:00Z \n",
"3 183.0 1576 5 45 2013-01-01T10:00:00Z \n",
"4 116.0 762 6 0 2013-01-01T11:00:00Z \n",
"... ... ... ... ... ... \n",
"336771 NaN 213 14 55 2013-09-30T18:00:00Z \n",
"336772 NaN 198 22 0 2013-10-01T02:00:00Z \n",
"336773 NaN 764 12 10 2013-09-30T16:00:00Z \n",
"336774 NaN 419 11 59 2013-09-30T15:00:00Z \n",
"336775 NaN 431 8 40 2013-09-30T12:00:00Z \n",
"\n",
"[336776 rows x 19 columns]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# TASK: Bring \"dest\", \"origin\", and \"carrier\" to the left,\n",
"# but keep all other columns\n",
"(\n",
" df_flights\n",
" >> gr.tf_select(\"dest\", \"origin\", \"carrier\", gr.everything())\n",
")"
]
},
{
"cell_type": "markdown",
"id": "be2a51ae",
"metadata": {},
"source": [
"# Row Filtering\n",
"\n",
"Just as we can select particular columns, we can *filter* to obtain particular rows."
]
},
{
"cell_type": "markdown",
"id": "c91fbe54",
"metadata": {},
"source": [
"## Accessing column values\n",
"\n",
"To access a single column of a DataFrame, we can use bracket `[]` notation:\n"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "433ce4c2",
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0 EWR\n",
"1 LGA\n",
"2 JFK\n",
"3 JFK\n",
"4 LGA\n",
" ... \n",
"336771 JFK\n",
"336772 LGA\n",
"336773 LGA\n",
"336774 LGA\n",
"336775 LGA\n",
"Name: origin, Length: 336776, dtype: object"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# NOTE: No need to edit\n",
"(\n",
" df_flights[\"origin\"]\n",
")"
]
},
{
"cell_type": "markdown",
"id": "df783ae3",
"metadata": {},
"source": [
"Note that this returns a different datatype: a Pandas *series*.\n"
]
},
{
"cell_type": "markdown",
"id": "b4deabe6",
"metadata": {},
"source": [
"## Making a comparison\n",
"\n",
"Remember that we have the following comparison operators:\n",
"\n",
"| Symbol | Compares |\n",
"|---|---|\n",
"| `x < y` | `x` less than `y` |\n",
"| `x <= y` | `x` less than or equal to `y` |\n",
"| `x > y` | `x` greater than `y` |\n",
"| `x >= y` | `x` greater than or equal to `y` |\n",
"| `x == y` | `x` (exactly) equals `y` |\n",
"| | Note that `==` works for strings too! |\n",
"| `x = y` | Error! |\n",
"\n",
"With a single column, we can make a comparison against a desired value:\n"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "de295691",
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0 False\n",
"1 False\n",
"2 True\n",
"3 True\n",
"4 False\n",
" ... \n",
"336771 True\n",
"336772 False\n",
"336773 False\n",
"336774 False\n",
"336775 False\n",
"Name: origin, Length: 336776, dtype: bool"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# NOTE: No need to edit\n",
"(\n",
" df_flights[\"origin\"] == \"JFK\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "f89e29e3",
"metadata": {},
"source": [
"The code above gives us `True` when the `\"origin\"` is `\"JFK\"`, and `False` when it is not.\n",
"\n",
"## Filtering using comparisons\n",
"\n",
"These `True`/`False` values are useful, because we can use them to *filter* to only those rows where the comparison yields `True`:\n"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "9015a1f2",
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" origin | \n",
" year | \n",
" month | \n",
" day | \n",
" dep_time | \n",
" sched_dep_time | \n",
" dep_delay | \n",
" arr_time | \n",
" sched_arr_time | \n",
" arr_delay | \n",
" carrier | \n",
" flight | \n",
" tailnum | \n",
" dest | \n",
" air_time | \n",
" distance | \n",
" hour | \n",
" minute | \n",
" time_hour | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" JFK | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 542.0 | \n",
" 540 | \n",
" 2.0 | \n",
" 923.0 | \n",
" 850 | \n",
" 33.0 | \n",
" AA | \n",
" 1141 | \n",
" N619AA | \n",
" MIA | \n",
" 160.0 | \n",
" 1089 | \n",
" 5 | \n",
" 40 | \n",
" 2013-01-01T10:00:00Z | \n",
"
\n",
" \n",
" 1 | \n",
" JFK | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 544.0 | \n",
" 545 | \n",
" -1.0 | \n",
" 1004.0 | \n",
" 1022 | \n",
" -18.0 | \n",
" B6 | \n",
" 725 | \n",
" N804JB | \n",
" BQN | \n",
" 183.0 | \n",
" 1576 | \n",
" 5 | \n",
" 45 | \n",
" 2013-01-01T10:00:00Z | \n",
"
\n",
" \n",
" 2 | \n",
" JFK | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 557.0 | \n",
" 600 | \n",
" -3.0 | \n",
" 838.0 | \n",
" 846 | \n",
" -8.0 | \n",
" B6 | \n",
" 79 | \n",
" N593JB | \n",
" MCO | \n",
" 140.0 | \n",
" 944 | \n",
" 6 | \n",
" 0 | \n",
" 2013-01-01T11:00:00Z | \n",
"
\n",
" \n",
" 3 | \n",
" JFK | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 558.0 | \n",
" 600 | \n",
" -2.0 | \n",
" 849.0 | \n",
" 851 | \n",
" -2.0 | \n",
" B6 | \n",
" 49 | \n",
" N793JB | \n",
" PBI | \n",
" 149.0 | \n",
" 1028 | \n",
" 6 | \n",
" 0 | \n",
" 2013-01-01T11:00:00Z | \n",
"
\n",
" \n",
" 4 | \n",
" JFK | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 558.0 | \n",
" 600 | \n",
" -2.0 | \n",
" 853.0 | \n",
" 856 | \n",
" -3.0 | \n",
" B6 | \n",
" 71 | \n",
" N657JB | \n",
" TPA | \n",
" 158.0 | \n",
" 1005 | \n",
" 6 | \n",
" 0 | \n",
" 2013-01-01T11:00:00Z | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 111274 | \n",
" JFK | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" 2240.0 | \n",
" 2250 | \n",
" -10.0 | \n",
" 2347.0 | \n",
" 7 | \n",
" -20.0 | \n",
" B6 | \n",
" 2002 | \n",
" N281JB | \n",
" BUF | \n",
" 52.0 | \n",
" 301 | \n",
" 22 | \n",
" 50 | \n",
" 2013-10-01T02:00:00Z | \n",
"
\n",
" \n",
" 111275 | \n",
" JFK | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" 2241.0 | \n",
" 2246 | \n",
" -5.0 | \n",
" 2345.0 | \n",
" 1 | \n",
" -16.0 | \n",
" B6 | \n",
" 486 | \n",
" N346JB | \n",
" ROC | \n",
" 47.0 | \n",
" 264 | \n",
" 22 | \n",
" 46 | \n",
" 2013-10-01T02:00:00Z | \n",
"
\n",
" \n",
" 111276 | \n",
" JFK | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" 2307.0 | \n",
" 2255 | \n",
" 12.0 | \n",
" 2359.0 | \n",
" 2358 | \n",
" 1.0 | \n",
" B6 | \n",
" 718 | \n",
" N565JB | \n",
" BOS | \n",
" 33.0 | \n",
" 187 | \n",
" 22 | \n",
" 55 | \n",
" 2013-10-01T02:00:00Z | \n",
"
\n",
" \n",
" 111277 | \n",
" JFK | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" 2349.0 | \n",
" 2359 | \n",
" -10.0 | \n",
" 325.0 | \n",
" 350 | \n",
" -25.0 | \n",
" B6 | \n",
" 745 | \n",
" N516JB | \n",
" PSE | \n",
" 196.0 | \n",
" 1617 | \n",
" 23 | \n",
" 59 | \n",
" 2013-10-01T03:00:00Z | \n",
"
\n",
" \n",
" 111278 | \n",
" JFK | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" NaN | \n",
" 1455 | \n",
" NaN | \n",
" NaN | \n",
" 1634 | \n",
" NaN | \n",
" 9E | \n",
" 3393 | \n",
" NaN | \n",
" DCA | \n",
" NaN | \n",
" 213 | \n",
" 14 | \n",
" 55 | \n",
" 2013-09-30T18:00:00Z | \n",
"
\n",
" \n",
"
\n",
"
111279 rows × 19 columns
\n",
"
"
],
"text/plain": [
" origin year month day dep_time sched_dep_time dep_delay \\\n",
"0 JFK 2013 1 1 542.0 540 2.0 \n",
"1 JFK 2013 1 1 544.0 545 -1.0 \n",
"2 JFK 2013 1 1 557.0 600 -3.0 \n",
"3 JFK 2013 1 1 558.0 600 -2.0 \n",
"4 JFK 2013 1 1 558.0 600 -2.0 \n",
"... ... ... ... ... ... ... ... \n",
"111274 JFK 2013 9 30 2240.0 2250 -10.0 \n",
"111275 JFK 2013 9 30 2241.0 2246 -5.0 \n",
"111276 JFK 2013 9 30 2307.0 2255 12.0 \n",
"111277 JFK 2013 9 30 2349.0 2359 -10.0 \n",
"111278 JFK 2013 9 30 NaN 1455 NaN \n",
"\n",
" arr_time sched_arr_time arr_delay carrier flight tailnum dest \\\n",
"0 923.0 850 33.0 AA 1141 N619AA MIA \n",
"1 1004.0 1022 -18.0 B6 725 N804JB BQN \n",
"2 838.0 846 -8.0 B6 79 N593JB MCO \n",
"3 849.0 851 -2.0 B6 49 N793JB PBI \n",
"4 853.0 856 -3.0 B6 71 N657JB TPA \n",
"... ... ... ... ... ... ... ... \n",
"111274 2347.0 7 -20.0 B6 2002 N281JB BUF \n",
"111275 2345.0 1 -16.0 B6 486 N346JB ROC \n",
"111276 2359.0 2358 1.0 B6 718 N565JB BOS \n",
"111277 325.0 350 -25.0 B6 745 N516JB PSE \n",
"111278 NaN 1634 NaN 9E 3393 NaN DCA \n",
"\n",
" air_time distance hour minute time_hour \n",
"0 160.0 1089 5 40 2013-01-01T10:00:00Z \n",
"1 183.0 1576 5 45 2013-01-01T10:00:00Z \n",
"2 140.0 944 6 0 2013-01-01T11:00:00Z \n",
"3 149.0 1028 6 0 2013-01-01T11:00:00Z \n",
"4 158.0 1005 6 0 2013-01-01T11:00:00Z \n",
"... ... ... ... ... ... \n",
"111274 52.0 301 22 50 2013-10-01T02:00:00Z \n",
"111275 47.0 264 22 46 2013-10-01T02:00:00Z \n",
"111276 33.0 187 22 55 2013-10-01T02:00:00Z \n",
"111277 196.0 1617 23 59 2013-10-01T03:00:00Z \n",
"111278 NaN 213 14 55 2013-09-30T18:00:00Z \n",
"\n",
"[111279 rows x 19 columns]"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# NOTE: No need to edit\n",
"(\n",
" df_flights\n",
" >> gr.tf_filter(df_flights[\"origin\"] == \"JFK\")\n",
" # Show that the \"origin\" is indeed \"JFK\"\n",
" >> gr.tf_select(\"origin\", gr.everything())\n",
")"
]
},
{
"cell_type": "markdown",
"id": "f0346c06",
"metadata": {},
"source": [
"### __q6__ Find the early departures\n",
"\n",
"Use `gr.tf_filter()` to find all the flights that left early.\n"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "b0a602a0",
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" year | \n",
" month | \n",
" day | \n",
" dep_time | \n",
" sched_dep_time | \n",
" dep_delay | \n",
" arr_time | \n",
" sched_arr_time | \n",
" arr_delay | \n",
" carrier | \n",
" flight | \n",
" tailnum | \n",
" origin | \n",
" dest | \n",
" air_time | \n",
" distance | \n",
" hour | \n",
" minute | \n",
" time_hour | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 544.0 | \n",
" 545 | \n",
" -1.0 | \n",
" 1004.0 | \n",
" 1022 | \n",
" -18.0 | \n",
" B6 | \n",
" 725 | \n",
" N804JB | \n",
" JFK | \n",
" BQN | \n",
" 183.0 | \n",
" 1576 | \n",
" 5 | \n",
" 45 | \n",
" 2013-01-01T10:00:00Z | \n",
"
\n",
" \n",
" 1 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 554.0 | \n",
" 600 | \n",
" -6.0 | \n",
" 812.0 | \n",
" 837 | \n",
" -25.0 | \n",
" DL | \n",
" 461 | \n",
" N668DN | \n",
" LGA | \n",
" ATL | \n",
" 116.0 | \n",
" 762 | \n",
" 6 | \n",
" 0 | \n",
" 2013-01-01T11:00:00Z | \n",
"
\n",
" \n",
" 2 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 554.0 | \n",
" 558 | \n",
" -4.0 | \n",
" 740.0 | \n",
" 728 | \n",
" 12.0 | \n",
" UA | \n",
" 1696 | \n",
" N39463 | \n",
" EWR | \n",
" ORD | \n",
" 150.0 | \n",
" 719 | \n",
" 5 | \n",
" 58 | \n",
" 2013-01-01T10:00:00Z | \n",
"
\n",
" \n",
" 3 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 555.0 | \n",
" 600 | \n",
" -5.0 | \n",
" 913.0 | \n",
" 854 | \n",
" 19.0 | \n",
" B6 | \n",
" 507 | \n",
" N516JB | \n",
" EWR | \n",
" FLL | \n",
" 158.0 | \n",
" 1065 | \n",
" 6 | \n",
" 0 | \n",
" 2013-01-01T11:00:00Z | \n",
"
\n",
" \n",
" 4 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 557.0 | \n",
" 600 | \n",
" -3.0 | \n",
" 709.0 | \n",
" 723 | \n",
" -14.0 | \n",
" EV | \n",
" 5708 | \n",
" N829AS | \n",
" LGA | \n",
" IAD | \n",
" 53.0 | \n",
" 229 | \n",
" 6 | \n",
" 0 | \n",
" 2013-01-01T11:00:00Z | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 183570 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" 2237.0 | \n",
" 2245 | \n",
" -8.0 | \n",
" 2345.0 | \n",
" 2353 | \n",
" -8.0 | \n",
" B6 | \n",
" 234 | \n",
" N318JB | \n",
" JFK | \n",
" BTV | \n",
" 43.0 | \n",
" 266 | \n",
" 22 | \n",
" 45 | \n",
" 2013-10-01T02:00:00Z | \n",
"
\n",
" \n",
" 183571 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" 2240.0 | \n",
" 2245 | \n",
" -5.0 | \n",
" 2334.0 | \n",
" 2351 | \n",
" -17.0 | \n",
" B6 | \n",
" 1816 | \n",
" N354JB | \n",
" JFK | \n",
" SYR | \n",
" 41.0 | \n",
" 209 | \n",
" 22 | \n",
" 45 | \n",
" 2013-10-01T02:00:00Z | \n",
"
\n",
" \n",
" 183572 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" 2240.0 | \n",
" 2250 | \n",
" -10.0 | \n",
" 2347.0 | \n",
" 7 | \n",
" -20.0 | \n",
" B6 | \n",
" 2002 | \n",
" N281JB | \n",
" JFK | \n",
" BUF | \n",
" 52.0 | \n",
" 301 | \n",
" 22 | \n",
" 50 | \n",
" 2013-10-01T02:00:00Z | \n",
"
\n",
" \n",
" 183573 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" 2241.0 | \n",
" 2246 | \n",
" -5.0 | \n",
" 2345.0 | \n",
" 1 | \n",
" -16.0 | \n",
" B6 | \n",
" 486 | \n",
" N346JB | \n",
" JFK | \n",
" ROC | \n",
" 47.0 | \n",
" 264 | \n",
" 22 | \n",
" 46 | \n",
" 2013-10-01T02:00:00Z | \n",
"
\n",
" \n",
" 183574 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" 2349.0 | \n",
" 2359 | \n",
" -10.0 | \n",
" 325.0 | \n",
" 350 | \n",
" -25.0 | \n",
" B6 | \n",
" 745 | \n",
" N516JB | \n",
" JFK | \n",
" PSE | \n",
" 196.0 | \n",
" 1617 | \n",
" 23 | \n",
" 59 | \n",
" 2013-10-01T03:00:00Z | \n",
"
\n",
" \n",
"
\n",
"
183575 rows × 19 columns
\n",
"
"
],
"text/plain": [
" year month day dep_time sched_dep_time dep_delay arr_time \\\n",
"0 2013 1 1 544.0 545 -1.0 1004.0 \n",
"1 2013 1 1 554.0 600 -6.0 812.0 \n",
"2 2013 1 1 554.0 558 -4.0 740.0 \n",
"3 2013 1 1 555.0 600 -5.0 913.0 \n",
"4 2013 1 1 557.0 600 -3.0 709.0 \n",
"... ... ... ... ... ... ... ... \n",
"183570 2013 9 30 2237.0 2245 -8.0 2345.0 \n",
"183571 2013 9 30 2240.0 2245 -5.0 2334.0 \n",
"183572 2013 9 30 2240.0 2250 -10.0 2347.0 \n",
"183573 2013 9 30 2241.0 2246 -5.0 2345.0 \n",
"183574 2013 9 30 2349.0 2359 -10.0 325.0 \n",
"\n",
" sched_arr_time arr_delay carrier flight tailnum origin dest \\\n",
"0 1022 -18.0 B6 725 N804JB JFK BQN \n",
"1 837 -25.0 DL 461 N668DN LGA ATL \n",
"2 728 12.0 UA 1696 N39463 EWR ORD \n",
"3 854 19.0 B6 507 N516JB EWR FLL \n",
"4 723 -14.0 EV 5708 N829AS LGA IAD \n",
"... ... ... ... ... ... ... ... \n",
"183570 2353 -8.0 B6 234 N318JB JFK BTV \n",
"183571 2351 -17.0 B6 1816 N354JB JFK SYR \n",
"183572 7 -20.0 B6 2002 N281JB JFK BUF \n",
"183573 1 -16.0 B6 486 N346JB JFK ROC \n",
"183574 350 -25.0 B6 745 N516JB JFK PSE \n",
"\n",
" air_time distance hour minute time_hour \n",
"0 183.0 1576 5 45 2013-01-01T10:00:00Z \n",
"1 116.0 762 6 0 2013-01-01T11:00:00Z \n",
"2 150.0 719 5 58 2013-01-01T10:00:00Z \n",
"3 158.0 1065 6 0 2013-01-01T11:00:00Z \n",
"4 53.0 229 6 0 2013-01-01T11:00:00Z \n",
"... ... ... ... ... ... \n",
"183570 43.0 266 22 45 2013-10-01T02:00:00Z \n",
"183571 41.0 209 22 45 2013-10-01T02:00:00Z \n",
"183572 52.0 301 22 50 2013-10-01T02:00:00Z \n",
"183573 47.0 264 22 46 2013-10-01T02:00:00Z \n",
"183574 196.0 1617 23 59 2013-10-01T03:00:00Z \n",
"\n",
"[183575 rows x 19 columns]"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# TASK: Filter for only those rows with a negative \"dep_delay\"\n",
"(\n",
" df_flights\n",
" >> gr.tf_filter(df_flights[\"dep_delay\"] < 0)\n",
")"
]
},
{
"cell_type": "markdown",
"id": "399d0b33",
"metadata": {},
"source": [
"## The data pronoun `DF`\n",
"\n",
"Way back at the beginning of this notebook, you may have noticed this line of code:\n",
"\n",
"```python\n",
"DF = gr.intention()\n",
"```\n",
"\n",
"This assigned the \"data pronoun\" to the variable `DF`. This pronoun can be used inside grama functions to refer to the data *as it is* at any stage in the pipeline. Rather than:\n",
"\n",
"```python\n",
"(\n",
" df_flights\n",
" >> gr.tf_filter(df_flights[\"dep_delay\"] < 0)\n",
")\n",
"```\n",
"\n",
"we instead write\n",
"\n",
"```python\n",
"(\n",
" df_flights\n",
" >> gr.tf_filter(DF[\"dep_delay\"] < 0)\n",
")\n",
"```\n",
"\n",
"The data pronoun `DF` is really just an alias for the DataFrame, so we can still use bracket notation `[]`.\n"
]
},
{
"cell_type": "markdown",
"id": "f6b16bef",
"metadata": {},
"source": [
"The data pronoun `DF` is not just convenient; it is *necessary* to make some operations work! Imagine we wanted to find all cases that satisfy both `dep_delay < 0` and `arr_delay < 0`. Consider the following code:\n",
"\n",
"```python\n",
"(\n",
" df_flights\n",
" # This filter works properly\n",
" >> gr.tf_filter(df_flights[\"dep_delay\"] < 0)\n",
" # We now have fewer rows! The next filter will fail\n",
" >> gr.tf_filter(df_flights[\"arr_delay\"] < 0)\n",
")\n",
"```\n",
"\n",
"The data pronoun `DF` allows us to refer to the data *as it is* in the pipeline; this resolves the issue with having fewer rows in the second filter. You'll get a chance to fix this code in the next task.\n"
]
},
{
"cell_type": "markdown",
"id": "c53d0031",
"metadata": {},
"source": [
"### __q7__ Use `DF` to fix this code\n",
"\n",
"Use the data pronoun `DF` to fix the following code:\n"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "8028450e",
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" year | \n",
" month | \n",
" day | \n",
" dep_time | \n",
" sched_dep_time | \n",
" dep_delay | \n",
" arr_time | \n",
" sched_arr_time | \n",
" arr_delay | \n",
" carrier | \n",
" flight | \n",
" tailnum | \n",
" origin | \n",
" dest | \n",
" air_time | \n",
" distance | \n",
" hour | \n",
" minute | \n",
" time_hour | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 544.0 | \n",
" 545 | \n",
" -1.0 | \n",
" 1004.0 | \n",
" 1022 | \n",
" -18.0 | \n",
" B6 | \n",
" 725 | \n",
" N804JB | \n",
" JFK | \n",
" BQN | \n",
" 183.0 | \n",
" 1576 | \n",
" 5 | \n",
" 45 | \n",
" 2013-01-01T10:00:00Z | \n",
"
\n",
" \n",
" 1 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 554.0 | \n",
" 600 | \n",
" -6.0 | \n",
" 812.0 | \n",
" 837 | \n",
" -25.0 | \n",
" DL | \n",
" 461 | \n",
" N668DN | \n",
" LGA | \n",
" ATL | \n",
" 116.0 | \n",
" 762 | \n",
" 6 | \n",
" 0 | \n",
" 2013-01-01T11:00:00Z | \n",
"
\n",
" \n",
" 2 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 557.0 | \n",
" 600 | \n",
" -3.0 | \n",
" 709.0 | \n",
" 723 | \n",
" -14.0 | \n",
" EV | \n",
" 5708 | \n",
" N829AS | \n",
" LGA | \n",
" IAD | \n",
" 53.0 | \n",
" 229 | \n",
" 6 | \n",
" 0 | \n",
" 2013-01-01T11:00:00Z | \n",
"
\n",
" \n",
" 3 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 557.0 | \n",
" 600 | \n",
" -3.0 | \n",
" 838.0 | \n",
" 846 | \n",
" -8.0 | \n",
" B6 | \n",
" 79 | \n",
" N593JB | \n",
" JFK | \n",
" MCO | \n",
" 140.0 | \n",
" 944 | \n",
" 6 | \n",
" 0 | \n",
" 2013-01-01T11:00:00Z | \n",
"
\n",
" \n",
" 4 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 558.0 | \n",
" 600 | \n",
" -2.0 | \n",
" 849.0 | \n",
" 851 | \n",
" -2.0 | \n",
" B6 | \n",
" 49 | \n",
" N793JB | \n",
" JFK | \n",
" PBI | \n",
" 149.0 | \n",
" 1028 | \n",
" 6 | \n",
" 0 | \n",
" 2013-01-01T11:00:00Z | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 144341 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" 2237.0 | \n",
" 2245 | \n",
" -8.0 | \n",
" 2345.0 | \n",
" 2353 | \n",
" -8.0 | \n",
" B6 | \n",
" 234 | \n",
" N318JB | \n",
" JFK | \n",
" BTV | \n",
" 43.0 | \n",
" 266 | \n",
" 22 | \n",
" 45 | \n",
" 2013-10-01T02:00:00Z | \n",
"
\n",
" \n",
" 144342 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" 2240.0 | \n",
" 2245 | \n",
" -5.0 | \n",
" 2334.0 | \n",
" 2351 | \n",
" -17.0 | \n",
" B6 | \n",
" 1816 | \n",
" N354JB | \n",
" JFK | \n",
" SYR | \n",
" 41.0 | \n",
" 209 | \n",
" 22 | \n",
" 45 | \n",
" 2013-10-01T02:00:00Z | \n",
"
\n",
" \n",
" 144343 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" 2240.0 | \n",
" 2250 | \n",
" -10.0 | \n",
" 2347.0 | \n",
" 7 | \n",
" -20.0 | \n",
" B6 | \n",
" 2002 | \n",
" N281JB | \n",
" JFK | \n",
" BUF | \n",
" 52.0 | \n",
" 301 | \n",
" 22 | \n",
" 50 | \n",
" 2013-10-01T02:00:00Z | \n",
"
\n",
" \n",
" 144344 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" 2241.0 | \n",
" 2246 | \n",
" -5.0 | \n",
" 2345.0 | \n",
" 1 | \n",
" -16.0 | \n",
" B6 | \n",
" 486 | \n",
" N346JB | \n",
" JFK | \n",
" ROC | \n",
" 47.0 | \n",
" 264 | \n",
" 22 | \n",
" 46 | \n",
" 2013-10-01T02:00:00Z | \n",
"
\n",
" \n",
" 144345 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" 2349.0 | \n",
" 2359 | \n",
" -10.0 | \n",
" 325.0 | \n",
" 350 | \n",
" -25.0 | \n",
" B6 | \n",
" 745 | \n",
" N516JB | \n",
" JFK | \n",
" PSE | \n",
" 196.0 | \n",
" 1617 | \n",
" 23 | \n",
" 59 | \n",
" 2013-10-01T03:00:00Z | \n",
"
\n",
" \n",
"
\n",
"
144346 rows × 19 columns
\n",
"
"
],
"text/plain": [
" year month day dep_time sched_dep_time dep_delay arr_time \\\n",
"0 2013 1 1 544.0 545 -1.0 1004.0 \n",
"1 2013 1 1 554.0 600 -6.0 812.0 \n",
"2 2013 1 1 557.0 600 -3.0 709.0 \n",
"3 2013 1 1 557.0 600 -3.0 838.0 \n",
"4 2013 1 1 558.0 600 -2.0 849.0 \n",
"... ... ... ... ... ... ... ... \n",
"144341 2013 9 30 2237.0 2245 -8.0 2345.0 \n",
"144342 2013 9 30 2240.0 2245 -5.0 2334.0 \n",
"144343 2013 9 30 2240.0 2250 -10.0 2347.0 \n",
"144344 2013 9 30 2241.0 2246 -5.0 2345.0 \n",
"144345 2013 9 30 2349.0 2359 -10.0 325.0 \n",
"\n",
" sched_arr_time arr_delay carrier flight tailnum origin dest \\\n",
"0 1022 -18.0 B6 725 N804JB JFK BQN \n",
"1 837 -25.0 DL 461 N668DN LGA ATL \n",
"2 723 -14.0 EV 5708 N829AS LGA IAD \n",
"3 846 -8.0 B6 79 N593JB JFK MCO \n",
"4 851 -2.0 B6 49 N793JB JFK PBI \n",
"... ... ... ... ... ... ... ... \n",
"144341 2353 -8.0 B6 234 N318JB JFK BTV \n",
"144342 2351 -17.0 B6 1816 N354JB JFK SYR \n",
"144343 7 -20.0 B6 2002 N281JB JFK BUF \n",
"144344 1 -16.0 B6 486 N346JB JFK ROC \n",
"144345 350 -25.0 B6 745 N516JB JFK PSE \n",
"\n",
" air_time distance hour minute time_hour \n",
"0 183.0 1576 5 45 2013-01-01T10:00:00Z \n",
"1 116.0 762 6 0 2013-01-01T11:00:00Z \n",
"2 53.0 229 6 0 2013-01-01T11:00:00Z \n",
"3 140.0 944 6 0 2013-01-01T11:00:00Z \n",
"4 149.0 1028 6 0 2013-01-01T11:00:00Z \n",
"... ... ... ... ... ... \n",
"144341 43.0 266 22 45 2013-10-01T02:00:00Z \n",
"144342 41.0 209 22 45 2013-10-01T02:00:00Z \n",
"144343 52.0 301 22 50 2013-10-01T02:00:00Z \n",
"144344 47.0 264 22 46 2013-10-01T02:00:00Z \n",
"144345 196.0 1617 23 59 2013-10-01T03:00:00Z \n",
"\n",
"[144346 rows x 19 columns]"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# TASK: Fix this code using the data pronoun DF\n",
"\n",
"(\n",
" df_flights\n",
" >> gr.tf_filter(DF[\"dep_delay\"] < 0)\n",
" >> gr.tf_filter(DF[\"arr_delay\"] < 0)\n",
")\n",
"# solution-end"
]
},
{
"cell_type": "markdown",
"id": "a2e25277",
"metadata": {},
"source": [
"# Arranging\n",
"\n",
"Filtering is particularly helpful when combined with *sorting*; we can sort on any column using the function `gr.tf_arrange()`. For instance, the following code sorts by the `\"distance\"`, pulling the shortest flights to the top of the DataFrame:\n"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "5e9268cf",
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" distance | \n",
" origin | \n",
" dest | \n",
" year | \n",
" month | \n",
" day | \n",
" dep_time | \n",
" sched_dep_time | \n",
" dep_delay | \n",
" arr_time | \n",
" sched_arr_time | \n",
" arr_delay | \n",
" carrier | \n",
" flight | \n",
" tailnum | \n",
" air_time | \n",
" hour | \n",
" minute | \n",
" time_hour | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 17 | \n",
" EWR | \n",
" LGA | \n",
" 2013 | \n",
" 7 | \n",
" 27 | \n",
" NaN | \n",
" 106 | \n",
" NaN | \n",
" NaN | \n",
" 245 | \n",
" NaN | \n",
" US | \n",
" 1632 | \n",
" NaN | \n",
" NaN | \n",
" 1 | \n",
" 6 | \n",
" 2013-07-27T05:00:00Z | \n",
"
\n",
" \n",
" 1 | \n",
" 80 | \n",
" EWR | \n",
" PHL | \n",
" 2013 | \n",
" 1 | \n",
" 4 | \n",
" 1240.0 | \n",
" 1200 | \n",
" 40.0 | \n",
" 1333.0 | \n",
" 1306 | \n",
" 27.0 | \n",
" EV | \n",
" 4193 | \n",
" N14972 | \n",
" 30.0 | \n",
" 12 | \n",
" 0 | \n",
" 2013-01-04T17:00:00Z | \n",
"
\n",
" \n",
" 2 | \n",
" 80 | \n",
" EWR | \n",
" PHL | \n",
" 2013 | \n",
" 1 | \n",
" 19 | \n",
" 1617.0 | \n",
" 1617 | \n",
" 0.0 | \n",
" 1722.0 | \n",
" 1722 | \n",
" 0.0 | \n",
" EV | \n",
" 4616 | \n",
" N12540 | \n",
" 34.0 | \n",
" 16 | \n",
" 17 | \n",
" 2013-01-19T21:00:00Z | \n",
"
\n",
" \n",
" 3 | \n",
" 80 | \n",
" EWR | \n",
" PHL | \n",
" 2013 | \n",
" 2 | \n",
" 1 | \n",
" 2128.0 | \n",
" 2129 | \n",
" -1.0 | \n",
" 2216.0 | \n",
" 2224 | \n",
" -8.0 | \n",
" EV | \n",
" 4619 | \n",
" N13969 | \n",
" 24.0 | \n",
" 21 | \n",
" 29 | \n",
" 2013-02-02T02:00:00Z | \n",
"
\n",
" \n",
" 4 | \n",
" 80 | \n",
" EWR | \n",
" PHL | \n",
" 2013 | \n",
" 1 | \n",
" 23 | \n",
" 2128.0 | \n",
" 2129 | \n",
" -1.0 | \n",
" 2221.0 | \n",
" 2224 | \n",
" -3.0 | \n",
" EV | \n",
" 4619 | \n",
" N12135 | \n",
" 23.0 | \n",
" 21 | \n",
" 29 | \n",
" 2013-01-24T02:00:00Z | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 336771 | \n",
" 4983 | \n",
" JFK | \n",
" HNL | \n",
" 2013 | \n",
" 12 | \n",
" 18 | \n",
" 928.0 | \n",
" 930 | \n",
" -2.0 | \n",
" 1543.0 | \n",
" 1535 | \n",
" 8.0 | \n",
" HA | \n",
" 51 | \n",
" N395HA | \n",
" 641.0 | \n",
" 9 | \n",
" 30 | \n",
" 2013-12-18T14:00:00Z | \n",
"
\n",
" \n",
" 336772 | \n",
" 4983 | \n",
" JFK | \n",
" HNL | \n",
" 2013 | \n",
" 6 | \n",
" 2 | \n",
" 956.0 | \n",
" 1000 | \n",
" -4.0 | \n",
" 1442.0 | \n",
" 1435 | \n",
" 7.0 | \n",
" HA | \n",
" 51 | \n",
" N383HA | \n",
" 617.0 | \n",
" 10 | \n",
" 0 | \n",
" 2013-06-02T14:00:00Z | \n",
"
\n",
" \n",
" 336773 | \n",
" 4983 | \n",
" JFK | \n",
" HNL | \n",
" 2013 | \n",
" 3 | \n",
" 17 | \n",
" 1006.0 | \n",
" 1000 | \n",
" 6.0 | \n",
" 1607.0 | \n",
" 1530 | \n",
" 37.0 | \n",
" HA | \n",
" 51 | \n",
" N380HA | \n",
" 686.0 | \n",
" 10 | \n",
" 0 | \n",
" 2013-03-17T14:00:00Z | \n",
"
\n",
" \n",
" 336774 | \n",
" 4983 | \n",
" JFK | \n",
" HNL | \n",
" 2013 | \n",
" 5 | \n",
" 28 | \n",
" 953.0 | \n",
" 1000 | \n",
" -7.0 | \n",
" 1447.0 | \n",
" 1500 | \n",
" -13.0 | \n",
" HA | \n",
" 51 | \n",
" N385HA | \n",
" 631.0 | \n",
" 10 | \n",
" 0 | \n",
" 2013-05-28T14:00:00Z | \n",
"
\n",
" \n",
" 336775 | \n",
" 4983 | \n",
" JFK | \n",
" HNL | \n",
" 2013 | \n",
" 8 | \n",
" 11 | \n",
" 950.0 | \n",
" 1000 | \n",
" -10.0 | \n",
" 1438.0 | \n",
" 1440 | \n",
" -2.0 | \n",
" HA | \n",
" 51 | \n",
" N391HA | \n",
" 628.0 | \n",
" 10 | \n",
" 0 | \n",
" 2013-08-11T14:00:00Z | \n",
"
\n",
" \n",
"
\n",
"
336776 rows × 19 columns
\n",
"
"
],
"text/plain": [
" distance origin dest year month day dep_time sched_dep_time \\\n",
"0 17 EWR LGA 2013 7 27 NaN 106 \n",
"1 80 EWR PHL 2013 1 4 1240.0 1200 \n",
"2 80 EWR PHL 2013 1 19 1617.0 1617 \n",
"3 80 EWR PHL 2013 2 1 2128.0 2129 \n",
"4 80 EWR PHL 2013 1 23 2128.0 2129 \n",
"... ... ... ... ... ... ... ... ... \n",
"336771 4983 JFK HNL 2013 12 18 928.0 930 \n",
"336772 4983 JFK HNL 2013 6 2 956.0 1000 \n",
"336773 4983 JFK HNL 2013 3 17 1006.0 1000 \n",
"336774 4983 JFK HNL 2013 5 28 953.0 1000 \n",
"336775 4983 JFK HNL 2013 8 11 950.0 1000 \n",
"\n",
" dep_delay arr_time sched_arr_time arr_delay carrier flight \\\n",
"0 NaN NaN 245 NaN US 1632 \n",
"1 40.0 1333.0 1306 27.0 EV 4193 \n",
"2 0.0 1722.0 1722 0.0 EV 4616 \n",
"3 -1.0 2216.0 2224 -8.0 EV 4619 \n",
"4 -1.0 2221.0 2224 -3.0 EV 4619 \n",
"... ... ... ... ... ... ... \n",
"336771 -2.0 1543.0 1535 8.0 HA 51 \n",
"336772 -4.0 1442.0 1435 7.0 HA 51 \n",
"336773 6.0 1607.0 1530 37.0 HA 51 \n",
"336774 -7.0 1447.0 1500 -13.0 HA 51 \n",
"336775 -10.0 1438.0 1440 -2.0 HA 51 \n",
"\n",
" tailnum air_time hour minute time_hour \n",
"0 NaN NaN 1 6 2013-07-27T05:00:00Z \n",
"1 N14972 30.0 12 0 2013-01-04T17:00:00Z \n",
"2 N12540 34.0 16 17 2013-01-19T21:00:00Z \n",
"3 N13969 24.0 21 29 2013-02-02T02:00:00Z \n",
"4 N12135 23.0 21 29 2013-01-24T02:00:00Z \n",
"... ... ... ... ... ... \n",
"336771 N395HA 641.0 9 30 2013-12-18T14:00:00Z \n",
"336772 N383HA 617.0 10 0 2013-06-02T14:00:00Z \n",
"336773 N380HA 686.0 10 0 2013-03-17T14:00:00Z \n",
"336774 N385HA 631.0 10 0 2013-05-28T14:00:00Z \n",
"336775 N391HA 628.0 10 0 2013-08-11T14:00:00Z \n",
"\n",
"[336776 rows x 19 columns]"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# NOTE: No need to edit\n",
"(\n",
" df_flights\n",
" # Sorts from smallest to largest\n",
" >> gr.tf_arrange(DF[\"distance\"])\n",
" # Inspect the route\n",
" >> gr.tf_select(\"distance\", \"origin\", \"dest\", gr.everything())\n",
")"
]
},
{
"cell_type": "markdown",
"id": "89f2fa80",
"metadata": {},
"source": [
"We can also reverse the order of the sort with the `gr.desc()` helper, as shown below:\n"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "95f97e1b",
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" distance | \n",
" origin | \n",
" dest | \n",
" year | \n",
" month | \n",
" day | \n",
" dep_time | \n",
" sched_dep_time | \n",
" dep_delay | \n",
" arr_time | \n",
" sched_arr_time | \n",
" arr_delay | \n",
" carrier | \n",
" flight | \n",
" tailnum | \n",
" air_time | \n",
" hour | \n",
" minute | \n",
" time_hour | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 4983 | \n",
" JFK | \n",
" HNL | \n",
" 2013 | \n",
" 5 | \n",
" 14 | \n",
" 959.0 | \n",
" 1000 | \n",
" -1.0 | \n",
" 1433.0 | \n",
" 1500 | \n",
" -27.0 | \n",
" HA | \n",
" 51 | \n",
" N388HA | \n",
" 608.0 | \n",
" 10 | \n",
" 0 | \n",
" 2013-05-14T14:00:00Z | \n",
"
\n",
" \n",
" 1 | \n",
" 4983 | \n",
" JFK | \n",
" HNL | \n",
" 2013 | \n",
" 1 | \n",
" 11 | \n",
" 855.0 | \n",
" 900 | \n",
" -5.0 | \n",
" 1442.0 | \n",
" 1530 | \n",
" -48.0 | \n",
" HA | \n",
" 51 | \n",
" N383HA | \n",
" 613.0 | \n",
" 9 | \n",
" 0 | \n",
" 2013-01-11T14:00:00Z | \n",
"
\n",
" \n",
" 2 | \n",
" 4983 | \n",
" JFK | \n",
" HNL | \n",
" 2013 | \n",
" 4 | \n",
" 6 | \n",
" 957.0 | \n",
" 1000 | \n",
" -3.0 | \n",
" 1510.0 | \n",
" 1510 | \n",
" 0.0 | \n",
" HA | \n",
" 51 | \n",
" N382HA | \n",
" 638.0 | \n",
" 10 | \n",
" 0 | \n",
" 2013-04-06T14:00:00Z | \n",
"
\n",
" \n",
" 3 | \n",
" 4983 | \n",
" JFK | \n",
" HNL | \n",
" 2013 | \n",
" 2 | \n",
" 15 | \n",
" 858.0 | \n",
" 900 | \n",
" -2.0 | \n",
" 1436.0 | \n",
" 1540 | \n",
" -64.0 | \n",
" HA | \n",
" 51 | \n",
" N382HA | \n",
" 611.0 | \n",
" 9 | \n",
" 0 | \n",
" 2013-02-15T14:00:00Z | \n",
"
\n",
" \n",
" 4 | \n",
" 4983 | \n",
" JFK | \n",
" HNL | \n",
" 2013 | \n",
" 7 | \n",
" 24 | \n",
" 958.0 | \n",
" 1000 | \n",
" -2.0 | \n",
" 1435.0 | \n",
" 1430 | \n",
" 5.0 | \n",
" HA | \n",
" 51 | \n",
" N384HA | \n",
" 609.0 | \n",
" 10 | \n",
" 0 | \n",
" 2013-07-24T14:00:00Z | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 336771 | \n",
" 80 | \n",
" EWR | \n",
" PHL | \n",
" 2013 | \n",
" 3 | \n",
" 2 | \n",
" 1926.0 | \n",
" 1929 | \n",
" -3.0 | \n",
" 2011.0 | \n",
" 2023 | \n",
" -12.0 | \n",
" EV | \n",
" 4457 | \n",
" N21144 | \n",
" 30.0 | \n",
" 19 | \n",
" 29 | \n",
" 2013-03-03T00:00:00Z | \n",
"
\n",
" \n",
" 336772 | \n",
" 80 | \n",
" EWR | \n",
" PHL | \n",
" 2013 | \n",
" 1 | \n",
" 27 | \n",
" 2128.0 | \n",
" 2129 | \n",
" -1.0 | \n",
" 2213.0 | \n",
" 2224 | \n",
" -11.0 | \n",
" EV | \n",
" 4619 | \n",
" N13969 | \n",
" 29.0 | \n",
" 21 | \n",
" 29 | \n",
" 2013-01-28T02:00:00Z | \n",
"
\n",
" \n",
" 336773 | \n",
" 80 | \n",
" EWR | \n",
" PHL | \n",
" 2013 | \n",
" 2 | \n",
" 3 | \n",
" 2153.0 | \n",
" 2129 | \n",
" 24.0 | \n",
" 2247.0 | \n",
" 2224 | \n",
" 23.0 | \n",
" EV | \n",
" 4619 | \n",
" N13913 | \n",
" 21.0 | \n",
" 21 | \n",
" 29 | \n",
" 2013-02-04T02:00:00Z | \n",
"
\n",
" \n",
" 336774 | \n",
" 80 | \n",
" EWR | \n",
" PHL | \n",
" 2013 | \n",
" 1 | \n",
" 9 | \n",
" 2126.0 | \n",
" 2129 | \n",
" -3.0 | \n",
" 2217.0 | \n",
" 2224 | \n",
" -7.0 | \n",
" EV | \n",
" 4619 | \n",
" N17560 | \n",
" 27.0 | \n",
" 21 | \n",
" 29 | \n",
" 2013-01-10T02:00:00Z | \n",
"
\n",
" \n",
" 336775 | \n",
" 17 | \n",
" EWR | \n",
" LGA | \n",
" 2013 | \n",
" 7 | \n",
" 27 | \n",
" NaN | \n",
" 106 | \n",
" NaN | \n",
" NaN | \n",
" 245 | \n",
" NaN | \n",
" US | \n",
" 1632 | \n",
" NaN | \n",
" NaN | \n",
" 1 | \n",
" 6 | \n",
" 2013-07-27T05:00:00Z | \n",
"
\n",
" \n",
"
\n",
"
336776 rows × 19 columns
\n",
"
"
],
"text/plain": [
" distance origin dest year month day dep_time sched_dep_time \\\n",
"0 4983 JFK HNL 2013 5 14 959.0 1000 \n",
"1 4983 JFK HNL 2013 1 11 855.0 900 \n",
"2 4983 JFK HNL 2013 4 6 957.0 1000 \n",
"3 4983 JFK HNL 2013 2 15 858.0 900 \n",
"4 4983 JFK HNL 2013 7 24 958.0 1000 \n",
"... ... ... ... ... ... ... ... ... \n",
"336771 80 EWR PHL 2013 3 2 1926.0 1929 \n",
"336772 80 EWR PHL 2013 1 27 2128.0 2129 \n",
"336773 80 EWR PHL 2013 2 3 2153.0 2129 \n",
"336774 80 EWR PHL 2013 1 9 2126.0 2129 \n",
"336775 17 EWR LGA 2013 7 27 NaN 106 \n",
"\n",
" dep_delay arr_time sched_arr_time arr_delay carrier flight \\\n",
"0 -1.0 1433.0 1500 -27.0 HA 51 \n",
"1 -5.0 1442.0 1530 -48.0 HA 51 \n",
"2 -3.0 1510.0 1510 0.0 HA 51 \n",
"3 -2.0 1436.0 1540 -64.0 HA 51 \n",
"4 -2.0 1435.0 1430 5.0 HA 51 \n",
"... ... ... ... ... ... ... \n",
"336771 -3.0 2011.0 2023 -12.0 EV 4457 \n",
"336772 -1.0 2213.0 2224 -11.0 EV 4619 \n",
"336773 24.0 2247.0 2224 23.0 EV 4619 \n",
"336774 -3.0 2217.0 2224 -7.0 EV 4619 \n",
"336775 NaN NaN 245 NaN US 1632 \n",
"\n",
" tailnum air_time hour minute time_hour \n",
"0 N388HA 608.0 10 0 2013-05-14T14:00:00Z \n",
"1 N383HA 613.0 9 0 2013-01-11T14:00:00Z \n",
"2 N382HA 638.0 10 0 2013-04-06T14:00:00Z \n",
"3 N382HA 611.0 9 0 2013-02-15T14:00:00Z \n",
"4 N384HA 609.0 10 0 2013-07-24T14:00:00Z \n",
"... ... ... ... ... ... \n",
"336771 N21144 30.0 19 29 2013-03-03T00:00:00Z \n",
"336772 N13969 29.0 21 29 2013-01-28T02:00:00Z \n",
"336773 N13913 21.0 21 29 2013-02-04T02:00:00Z \n",
"336774 N17560 27.0 21 29 2013-01-10T02:00:00Z \n",
"336775 NaN NaN 1 6 2013-07-27T05:00:00Z \n",
"\n",
"[336776 rows x 19 columns]"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# NOTE: No need to edit\n",
"(\n",
" df_flights\n",
" # Sorts from largest to smallest (*desc*ending)\n",
" >> gr.tf_arrange(gr.desc(DF[\"distance\"]))\n",
" # Inspect the route\n",
" >> gr.tf_select(\"distance\", \"origin\", \"dest\", gr.everything())\n",
")"
]
},
{
"cell_type": "markdown",
"id": "61863712",
"metadata": {},
"source": [
"### __q8__ Find the earliest departures\n",
"\n",
"Find the top 10 *earliest* departures. How early did these depart? Do these early departures have anything in common?\n",
"\n",
"*Hint*: You will need to combine functions to accomplish this.\n"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "cd6df828",
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" dep_delay | \n",
" origin | \n",
" dest | \n",
" year | \n",
" month | \n",
" day | \n",
" dep_time | \n",
" sched_dep_time | \n",
" arr_time | \n",
" sched_arr_time | \n",
" arr_delay | \n",
" carrier | \n",
" flight | \n",
" tailnum | \n",
" air_time | \n",
" distance | \n",
" hour | \n",
" minute | \n",
" time_hour | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" -43.0 | \n",
" JFK | \n",
" DEN | \n",
" 2013 | \n",
" 12 | \n",
" 7 | \n",
" 2040.0 | \n",
" 2123 | \n",
" 40.0 | \n",
" 2352 | \n",
" 48.0 | \n",
" B6 | \n",
" 97 | \n",
" N592JB | \n",
" 265.0 | \n",
" 1626 | \n",
" 21 | \n",
" 23 | \n",
" 2013-12-08T02:00:00Z | \n",
"
\n",
" \n",
" 1 | \n",
" -33.0 | \n",
" LGA | \n",
" MSY | \n",
" 2013 | \n",
" 2 | \n",
" 3 | \n",
" 2022.0 | \n",
" 2055 | \n",
" 2240.0 | \n",
" 2338 | \n",
" -58.0 | \n",
" DL | \n",
" 1715 | \n",
" N612DL | \n",
" 162.0 | \n",
" 1183 | \n",
" 20 | \n",
" 55 | \n",
" 2013-02-04T01:00:00Z | \n",
"
\n",
" \n",
" 2 | \n",
" -32.0 | \n",
" LGA | \n",
" IAD | \n",
" 2013 | \n",
" 11 | \n",
" 10 | \n",
" 1408.0 | \n",
" 1440 | \n",
" 1549.0 | \n",
" 1559 | \n",
" -10.0 | \n",
" EV | \n",
" 5713 | \n",
" N825AS | \n",
" 52.0 | \n",
" 229 | \n",
" 14 | \n",
" 40 | \n",
" 2013-11-10T19:00:00Z | \n",
"
\n",
" \n",
" 3 | \n",
" -30.0 | \n",
" LGA | \n",
" TPA | \n",
" 2013 | \n",
" 1 | \n",
" 11 | \n",
" 1900.0 | \n",
" 1930 | \n",
" 2233.0 | \n",
" 2243 | \n",
" -10.0 | \n",
" DL | \n",
" 1435 | \n",
" N934DL | \n",
" 139.0 | \n",
" 1010 | \n",
" 19 | \n",
" 30 | \n",
" 2013-01-12T00:00:00Z | \n",
"
\n",
" \n",
" 4 | \n",
" -27.0 | \n",
" LGA | \n",
" DEN | \n",
" 2013 | \n",
" 1 | \n",
" 29 | \n",
" 1703.0 | \n",
" 1730 | \n",
" 1947.0 | \n",
" 1957 | \n",
" -10.0 | \n",
" F9 | \n",
" 837 | \n",
" N208FR | \n",
" 250.0 | \n",
" 1620 | \n",
" 17 | \n",
" 30 | \n",
" 2013-01-29T22:00:00Z | \n",
"
\n",
" \n",
" 5 | \n",
" -26.0 | \n",
" LGA | \n",
" DTW | \n",
" 2013 | \n",
" 8 | \n",
" 9 | \n",
" 729.0 | \n",
" 755 | \n",
" 1002.0 | \n",
" 955 | \n",
" 7.0 | \n",
" MQ | \n",
" 3478 | \n",
" N711MQ | \n",
" 88.0 | \n",
" 502 | \n",
" 7 | \n",
" 55 | \n",
" 2013-08-09T11:00:00Z | \n",
"
\n",
" \n",
" 6 | \n",
" -25.0 | \n",
" LGA | \n",
" DTW | \n",
" 2013 | \n",
" 3 | \n",
" 30 | \n",
" 2030.0 | \n",
" 2055 | \n",
" 2213.0 | \n",
" 2250 | \n",
" -37.0 | \n",
" MQ | \n",
" 4573 | \n",
" N725MQ | \n",
" 87.0 | \n",
" 502 | \n",
" 20 | \n",
" 55 | \n",
" 2013-03-31T00:00:00Z | \n",
"
\n",
" \n",
" 7 | \n",
" -25.0 | \n",
" EWR | \n",
" TYS | \n",
" 2013 | \n",
" 10 | \n",
" 23 | \n",
" 1907.0 | \n",
" 1932 | \n",
" 2143.0 | \n",
" 2143 | \n",
" 0.0 | \n",
" EV | \n",
" 4361 | \n",
" N13994 | \n",
" 111.0 | \n",
" 631 | \n",
" 19 | \n",
" 32 | \n",
" 2013-10-23T23:00:00Z | \n",
"
\n",
" \n",
" 8 | \n",
" -24.0 | \n",
" LGA | \n",
" STL | \n",
" 2013 | \n",
" 9 | \n",
" 18 | \n",
" 1631.0 | \n",
" 1655 | \n",
" 1812.0 | \n",
" 1845 | \n",
" -33.0 | \n",
" AA | \n",
" 2223 | \n",
" N4XXAA | \n",
" 125.0 | \n",
" 888 | \n",
" 16 | \n",
" 55 | \n",
" 2013-09-18T20:00:00Z | \n",
"
\n",
" \n",
" 9 | \n",
" -24.0 | \n",
" JFK | \n",
" BUF | \n",
" 2013 | \n",
" 3 | \n",
" 2 | \n",
" 1431.0 | \n",
" 1455 | \n",
" 1601.0 | \n",
" 1631 | \n",
" -30.0 | \n",
" 9E | \n",
" 3318 | \n",
" N929XJ | \n",
" 55.0 | \n",
" 301 | \n",
" 14 | \n",
" 55 | \n",
" 2013-03-02T19:00:00Z | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" dep_delay origin dest year month day dep_time sched_dep_time \\\n",
"0 -43.0 JFK DEN 2013 12 7 2040.0 2123 \n",
"1 -33.0 LGA MSY 2013 2 3 2022.0 2055 \n",
"2 -32.0 LGA IAD 2013 11 10 1408.0 1440 \n",
"3 -30.0 LGA TPA 2013 1 11 1900.0 1930 \n",
"4 -27.0 LGA DEN 2013 1 29 1703.0 1730 \n",
"5 -26.0 LGA DTW 2013 8 9 729.0 755 \n",
"6 -25.0 LGA DTW 2013 3 30 2030.0 2055 \n",
"7 -25.0 EWR TYS 2013 10 23 1907.0 1932 \n",
"8 -24.0 LGA STL 2013 9 18 1631.0 1655 \n",
"9 -24.0 JFK BUF 2013 3 2 1431.0 1455 \n",
"\n",
" arr_time sched_arr_time arr_delay carrier flight tailnum air_time \\\n",
"0 40.0 2352 48.0 B6 97 N592JB 265.0 \n",
"1 2240.0 2338 -58.0 DL 1715 N612DL 162.0 \n",
"2 1549.0 1559 -10.0 EV 5713 N825AS 52.0 \n",
"3 2233.0 2243 -10.0 DL 1435 N934DL 139.0 \n",
"4 1947.0 1957 -10.0 F9 837 N208FR 250.0 \n",
"5 1002.0 955 7.0 MQ 3478 N711MQ 88.0 \n",
"6 2213.0 2250 -37.0 MQ 4573 N725MQ 87.0 \n",
"7 2143.0 2143 0.0 EV 4361 N13994 111.0 \n",
"8 1812.0 1845 -33.0 AA 2223 N4XXAA 125.0 \n",
"9 1601.0 1631 -30.0 9E 3318 N929XJ 55.0 \n",
"\n",
" distance hour minute time_hour \n",
"0 1626 21 23 2013-12-08T02:00:00Z \n",
"1 1183 20 55 2013-02-04T01:00:00Z \n",
"2 229 14 40 2013-11-10T19:00:00Z \n",
"3 1010 19 30 2013-01-12T00:00:00Z \n",
"4 1620 17 30 2013-01-29T22:00:00Z \n",
"5 502 7 55 2013-08-09T11:00:00Z \n",
"6 502 20 55 2013-03-31T00:00:00Z \n",
"7 631 19 32 2013-10-23T23:00:00Z \n",
"8 888 16 55 2013-09-18T20:00:00Z \n",
"9 301 14 55 2013-03-02T19:00:00Z "
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# TASK: Find the top 10 earliest departures\n",
"(\n",
" df_flights\n",
" >> gr.tf_arrange(DF[\"dep_delay\"])\n",
" >> gr.tf_head(10)\n",
" >> gr.tf_select(\"dep_delay\", \"origin\", \"dest\", gr.everything())\n",
")"
]
},
{
"cell_type": "markdown",
"id": "e8d9c17e",
"metadata": {},
"source": [
"# Isolating to answer questions\n",
"\n",
"Before we close this exercise, let's use these data isolation tools to answer a real question about the dataset:\n"
]
},
{
"cell_type": "markdown",
"id": "70afe3c2",
"metadata": {},
"source": [
"### __q9__ What are these data for?\n",
"\n",
"What are these data for? In particular, in what way are they \"focused on the NYC area\"? Complete the following tasks, and answer the questions under *observations* below.\n",
"\n",
"*Hint*: You might find it helpful to use the `or` operator using the symbol `|`. The `or` keyword does not work inside a `gr.tf_filter()`.\n"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "9ca612cb",
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" year | \n",
" month | \n",
" day | \n",
" dep_time | \n",
" sched_dep_time | \n",
" dep_delay | \n",
" arr_time | \n",
" sched_arr_time | \n",
" arr_delay | \n",
" carrier | \n",
" flight | \n",
" tailnum | \n",
" origin | \n",
" dest | \n",
" air_time | \n",
" distance | \n",
" hour | \n",
" minute | \n",
" time_hour | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2013 | \n",
" 7 | \n",
" 27 | \n",
" NaN | \n",
" 106 | \n",
" NaN | \n",
" NaN | \n",
" 245 | \n",
" NaN | \n",
" US | \n",
" 1632 | \n",
" NaN | \n",
" EWR | \n",
" LGA | \n",
" NaN | \n",
" 17 | \n",
" 1 | \n",
" 6 | \n",
" 2013-07-27T05:00:00Z | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" year month day dep_time sched_dep_time dep_delay arr_time \\\n",
"0 2013 7 27 NaN 106 NaN NaN \n",
"\n",
" sched_arr_time arr_delay carrier flight tailnum origin dest air_time \\\n",
"0 245 NaN US 1632 NaN EWR LGA NaN \n",
"\n",
" distance hour minute time_hour \n",
"0 17 1 6 2013-07-27T05:00:00Z "
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# TASK: Filter to only those cases where the destination airport\n",
"# is one of \"JFK\", \"LGA\", or \"EWR\"\n",
"df_q9dest = (\n",
" df_flights\n",
" >> gr.tf_filter(\n",
" (DF[\"dest\"] == \"JFK\") |\n",
" (DF[\"dest\"] == \"LGA\") |\n",
" (DF[\"dest\"] == \"EWR\")\n",
" )\n",
")\n",
"\n",
"# NOTE: No need to edit; use this to check your work\n",
"assert \\\n",
" df_q9dest.shape[0] == 1, \\\n",
" \"Incorrect filter\"\n",
"\n",
"df_q9dest "
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "68097087",
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" year | \n",
" month | \n",
" day | \n",
" dep_time | \n",
" sched_dep_time | \n",
" dep_delay | \n",
" arr_time | \n",
" sched_arr_time | \n",
" arr_delay | \n",
" carrier | \n",
" flight | \n",
" tailnum | \n",
" origin | \n",
" dest | \n",
" air_time | \n",
" distance | \n",
" hour | \n",
" minute | \n",
" time_hour | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 517.0 | \n",
" 515 | \n",
" 2.0 | \n",
" 830.0 | \n",
" 819 | \n",
" 11.0 | \n",
" UA | \n",
" 1545 | \n",
" N14228 | \n",
" EWR | \n",
" IAH | \n",
" 227.0 | \n",
" 1400 | \n",
" 5 | \n",
" 15 | \n",
" 2013-01-01T10:00:00Z | \n",
"
\n",
" \n",
" 1 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 533.0 | \n",
" 529 | \n",
" 4.0 | \n",
" 850.0 | \n",
" 830 | \n",
" 20.0 | \n",
" UA | \n",
" 1714 | \n",
" N24211 | \n",
" LGA | \n",
" IAH | \n",
" 227.0 | \n",
" 1416 | \n",
" 5 | \n",
" 29 | \n",
" 2013-01-01T10:00:00Z | \n",
"
\n",
" \n",
" 2 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 542.0 | \n",
" 540 | \n",
" 2.0 | \n",
" 923.0 | \n",
" 850 | \n",
" 33.0 | \n",
" AA | \n",
" 1141 | \n",
" N619AA | \n",
" JFK | \n",
" MIA | \n",
" 160.0 | \n",
" 1089 | \n",
" 5 | \n",
" 40 | \n",
" 2013-01-01T10:00:00Z | \n",
"
\n",
" \n",
" 3 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 544.0 | \n",
" 545 | \n",
" -1.0 | \n",
" 1004.0 | \n",
" 1022 | \n",
" -18.0 | \n",
" B6 | \n",
" 725 | \n",
" N804JB | \n",
" JFK | \n",
" BQN | \n",
" 183.0 | \n",
" 1576 | \n",
" 5 | \n",
" 45 | \n",
" 2013-01-01T10:00:00Z | \n",
"
\n",
" \n",
" 4 | \n",
" 2013 | \n",
" 1 | \n",
" 1 | \n",
" 554.0 | \n",
" 600 | \n",
" -6.0 | \n",
" 812.0 | \n",
" 837 | \n",
" -25.0 | \n",
" DL | \n",
" 461 | \n",
" N668DN | \n",
" LGA | \n",
" ATL | \n",
" 116.0 | \n",
" 762 | \n",
" 6 | \n",
" 0 | \n",
" 2013-01-01T11:00:00Z | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 336771 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" NaN | \n",
" 1455 | \n",
" NaN | \n",
" NaN | \n",
" 1634 | \n",
" NaN | \n",
" 9E | \n",
" 3393 | \n",
" NaN | \n",
" JFK | \n",
" DCA | \n",
" NaN | \n",
" 213 | \n",
" 14 | \n",
" 55 | \n",
" 2013-09-30T18:00:00Z | \n",
"
\n",
" \n",
" 336772 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" NaN | \n",
" 2200 | \n",
" NaN | \n",
" NaN | \n",
" 2312 | \n",
" NaN | \n",
" 9E | \n",
" 3525 | \n",
" NaN | \n",
" LGA | \n",
" SYR | \n",
" NaN | \n",
" 198 | \n",
" 22 | \n",
" 0 | \n",
" 2013-10-01T02:00:00Z | \n",
"
\n",
" \n",
" 336773 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" NaN | \n",
" 1210 | \n",
" NaN | \n",
" NaN | \n",
" 1330 | \n",
" NaN | \n",
" MQ | \n",
" 3461 | \n",
" N535MQ | \n",
" LGA | \n",
" BNA | \n",
" NaN | \n",
" 764 | \n",
" 12 | \n",
" 10 | \n",
" 2013-09-30T16:00:00Z | \n",
"
\n",
" \n",
" 336774 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" NaN | \n",
" 1159 | \n",
" NaN | \n",
" NaN | \n",
" 1344 | \n",
" NaN | \n",
" MQ | \n",
" 3572 | \n",
" N511MQ | \n",
" LGA | \n",
" CLE | \n",
" NaN | \n",
" 419 | \n",
" 11 | \n",
" 59 | \n",
" 2013-09-30T15:00:00Z | \n",
"
\n",
" \n",
" 336775 | \n",
" 2013 | \n",
" 9 | \n",
" 30 | \n",
" NaN | \n",
" 840 | \n",
" NaN | \n",
" NaN | \n",
" 1020 | \n",
" NaN | \n",
" MQ | \n",
" 3531 | \n",
" N839MQ | \n",
" LGA | \n",
" RDU | \n",
" NaN | \n",
" 431 | \n",
" 8 | \n",
" 40 | \n",
" 2013-09-30T12:00:00Z | \n",
"
\n",
" \n",
"
\n",
"
336776 rows × 19 columns
\n",
"
"
],
"text/plain": [
" year month day dep_time sched_dep_time dep_delay arr_time \\\n",
"0 2013 1 1 517.0 515 2.0 830.0 \n",
"1 2013 1 1 533.0 529 4.0 850.0 \n",
"2 2013 1 1 542.0 540 2.0 923.0 \n",
"3 2013 1 1 544.0 545 -1.0 1004.0 \n",
"4 2013 1 1 554.0 600 -6.0 812.0 \n",
"... ... ... ... ... ... ... ... \n",
"336771 2013 9 30 NaN 1455 NaN NaN \n",
"336772 2013 9 30 NaN 2200 NaN NaN \n",
"336773 2013 9 30 NaN 1210 NaN NaN \n",
"336774 2013 9 30 NaN 1159 NaN NaN \n",
"336775 2013 9 30 NaN 840 NaN NaN \n",
"\n",
" sched_arr_time arr_delay carrier flight tailnum origin dest \\\n",
"0 819 11.0 UA 1545 N14228 EWR IAH \n",
"1 830 20.0 UA 1714 N24211 LGA IAH \n",
"2 850 33.0 AA 1141 N619AA JFK MIA \n",
"3 1022 -18.0 B6 725 N804JB JFK BQN \n",
"4 837 -25.0 DL 461 N668DN LGA ATL \n",
"... ... ... ... ... ... ... ... \n",
"336771 1634 NaN 9E 3393 NaN JFK DCA \n",
"336772 2312 NaN 9E 3525 NaN LGA SYR \n",
"336773 1330 NaN MQ 3461 N535MQ LGA BNA \n",
"336774 1344 NaN MQ 3572 N511MQ LGA CLE \n",
"336775 1020 NaN MQ 3531 N839MQ LGA RDU \n",
"\n",
" air_time distance hour minute time_hour \n",
"0 227.0 1400 5 15 2013-01-01T10:00:00Z \n",
"1 227.0 1416 5 29 2013-01-01T10:00:00Z \n",
"2 160.0 1089 5 40 2013-01-01T10:00:00Z \n",
"3 183.0 1576 5 45 2013-01-01T10:00:00Z \n",
"4 116.0 762 6 0 2013-01-01T11:00:00Z \n",
"... ... ... ... ... ... \n",
"336771 NaN 213 14 55 2013-09-30T18:00:00Z \n",
"336772 NaN 198 22 0 2013-10-01T02:00:00Z \n",
"336773 NaN 764 12 10 2013-09-30T16:00:00Z \n",
"336774 NaN 419 11 59 2013-09-30T15:00:00Z \n",
"336775 NaN 431 8 40 2013-09-30T12:00:00Z \n",
"\n",
"[336776 rows x 19 columns]"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# TASK: Filter to only those cases where the origin airport\n",
"# is one of \"JFK\", \"LGA\", or \"EWR\"\n",
"df_q9origin = (\n",
" df_flights\n",
" >> gr.tf_filter(\n",
" (DF[\"origin\"] == \"JFK\") |\n",
" (DF[\"origin\"] == \"LGA\") |\n",
" (DF[\"origin\"] == \"EWR\")\n",
" )\n",
")\n",
"\n",
"# NOTE: No need to edit; use this to check your work\n",
"assert \\\n",
" df_q9origin.shape[0] == df_flights.shape[0], \\\n",
" \"Incorrect filter\"\n",
"\n",
"df_q9origin "
]
},
{
"cell_type": "markdown",
"id": "cafc89b3",
"metadata": {},
"source": [
"*Observations*\n",
"\n",
"- How many rows have either \"JFK\", \"LGA\", or \"EWR\" as their *destination*?\n",
" - Just one!\n",
"- How many rows have either \"JFK\", \"LGA\", or \"EWR\" as their *origin*?\n",
" - **All** of the rows have \"JFK\", \"LGA\", or \"EWR\" as their origin\n",
"- Would we be answer questions related to flights *entering* the NYC area using this dataset?\n",
" - Nope! We have virtually no data on those flights.\n",
"- In what sense is this dataset \"focused on NYC\"?\n",
" - This dataset is focused on flights *departing* from NYC.\n"
]
},
{
"cell_type": "markdown",
"id": "850835f6",
"metadata": {},
"source": [
"*Aside*: Data are not just numbers. Data are *numbers with context*. Every dataset is put together for some reason. This reason will inform what observations (rows) and variables (columns) are *in the data*, and which are *not in the data*. Conversely, thinking carefully about what data a person or organization bothered to collect---and what they ignored---can tell you something about the *perspective* of those who collected the data. Thinking about these issues is partly what separates __data science__ from programming or machine learning. (`end-rant`)\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}