7 Setup: Function Basics
Purpose: Functions are our primary tool in carying out data analysis with the tidyverse
. It is unreasonable to expect yourself to memorize every function and all its details. To that end, we’ll learn some basic function literacy in R; how to inspect a function, look up its documentation, and find examples on a function’s use.
Reading: (None, this is the reading)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
7.1 Getting help
No programmer memorizes how every single function works. Instead, effective programmers get used to looking up documentation. In R this is easy; if there’s a function we want to learn about, we can run ?function
in our console.
For instance, to get help on the lm()
function, we could execute
> ?lm
Note: The >
above is not part of the command; it automatically appears in our R console.
Hint: In RStudio, we can press CTRL + 2
to switch focus to the R console.
Some functions are found in multiple packages; in this case, we need to click a link in the help panel. For instance, the following will open up a help panel with a few links:
## Help on topic 'tibble' was found in the following packages:
##
## Package Library
## dplyr /home/runner/work/_temp/Library
## tidyr /home/runner/work/_temp/Library
## tibble /home/runner/work/_temp/Library
##
##
## Using the first match ...
At this point, we should just pick a link, and go back if it’s not relevant.
7.2 (Not) Executing functions
If we try to run a function without using parentheses, we get some odd behavior:
## function (..., deparse.level = 1)
## .Internal(rbind(deparse.level, ...))
## <bytecode: 0x558f4d038808>
## <environment: namespace:base>
Calling the function this way shows its source code. This can sometimes be helpful for understanding, but isn’t (usually) what we want out of our functions.
7.2.1 q2 Show source code
Show the source code for lm
.
## function (formula, data, subset, weights, na.action, method = "qr",
## model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE,
## contrasts = NULL, offset, ...)
## {
## ret.x <- x
## ret.y <- y
## cl <- match.call()
## mf <- match.call(expand.dots = FALSE)
## m <- match(c("formula", "data", "subset", "weights", "na.action",
## "offset"), names(mf), 0L)
## mf <- mf[c(1L, m)]
## mf$drop.unused.levels <- TRUE
## mf[[1L]] <- quote(stats::model.frame)
## mf <- eval(mf, parent.frame())
## if (method == "model.frame")
## return(mf)
## else if (method != "qr")
## warning(gettextf("method = '%s' is not supported. Using 'qr'",
## method), domain = NA)
## mt <- attr(mf, "terms")
## y <- model.response(mf, "numeric")
## w <- as.vector(model.weights(mf))
## if (!is.null(w) && !is.numeric(w))
## stop("'weights' must be a numeric vector")
## offset <- model.offset(mf)
## mlm <- is.matrix(y)
## ny <- if (mlm)
## nrow(y)
## else length(y)
## if (!is.null(offset)) {
## if (!mlm)
## offset <- as.vector(offset)
## if (NROW(offset) != ny)
## stop(gettextf("number of offsets is %d, should equal %d (number of observations)",
## NROW(offset), ny), domain = NA)
## }
## if (is.empty.model(mt)) {
## x <- NULL
## z <- list(coefficients = if (mlm) matrix(NA_real_, 0,
## ncol(y)) else numeric(), residuals = y, fitted.values = 0 *
## y, weights = w, rank = 0L, df.residual = if (!is.null(w)) sum(w !=
## 0) else ny)
## if (!is.null(offset)) {
## z$fitted.values <- offset
## z$residuals <- y - offset
## }
## }
## else {
## x <- model.matrix(mt, mf, contrasts)
## z <- if (is.null(w))
## lm.fit(x, y, offset = offset, singular.ok = singular.ok,
## ...)
## else lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok,
## ...)
## }
## class(z) <- c(if (mlm) "mlm", "lm")
## z$na.action <- attr(mf, "na.action")
## z$offset <- offset
## z$contrasts <- attr(x, "contrasts")
## z$xlevels <- .getXlevels(mt, mf)
## z$call <- cl
## z$terms <- mt
## if (model)
## z$model <- mf
## if (ret.x)
## z$x <- x
## if (ret.y)
## z$y <- y
## if (!qr)
## z$qr <- NULL
## z
## }
## <bytecode: 0x558f509a94b8>
## <environment: namespace:stats>
7.3 Executing functions (for real)
To actually run a function, we need to call it with parentheses ()
and provide all of its required arguments. Arguments are inputs to a function.
One simple—but important—function in R is the c()
function: This takes multiple items and combines them into a vector.
## [1] 1 2 3
Note that c()
takes a variable number of arguments; we can pass as many values as we need to,
## [1] 1 2 3 4 5 6 7 8
Other functions take a specific number of arguments, such as seq()
, which builds a sequence of values:
## [1] 1 2 3 4 5 6 7 8 9 10
Aside: If you’re familiar with other programming languages (like Python), R might offend your aesthetic sensibilities. In R, we can optionally specify the argument name for positional arguments; for instance, the following allows works:
## [1] 1 2 3 4 5 6 7 8 9 10
Many functions have optional arguments: These functions have reasonable default values, which we can override to get different behavior. For instance, the seq
function allows us to specify the “stride” of our sequence with a by
argument:
## [1] 1 3 5 7 9
The best way to figure out what arguments a function requires is to read its documentation.
Nerdy aside: Computer scientists draw a distinction between “parameters” and “arguments”—there’s a Wikipedia article about this.
7.4 Adapting examples
Practically, one of the best ways to use a function is to find an example that’s close to your intended use, and adapt that example. R documentation tends to be very good with many relevant examples. The examples are often at the bottom of the documentation, so sometimes it’s best to just scroll to the bottom and check the examples.