Principles behind functional programming in R
Defining your own functions
Anonymous functions
Making your functions purrrrr
Exercise: simulate data
formals()
- the list of arguments that control how you call the function.body()
- the code inside the function.environment()
- the data structure that determines how the function finds the values associated with the names.There is one exception to the rule that a function has three components. Primitive functions, like sum()
and [
, call C code directly.
These functions exist primarily in C, not R, so their formals()
, body()
, and environment()
return all NULL.
More or less anything you can do with other objects (dataframes, variables, …), you can also do with functions
You don’t always need to give your functions a name. If you just want to do something once, there’s no need to define it first and invoke it later. You can simply create it on the fly, and forget about it again.
This is called an anonymous function.
Maybe you’ve seen this error message before
This sometimes happens when we mess up R syntax, which leads to R trying to do something to a function that it cannot do (e.g., subset it).
The error message is also obscure because we rarely encounter the name “closure”. But in fact, almost all (except some core R functions) functions are of type closure.
The name closure is meant to signal that functions enclose their own environments.
R operates with a hierarchy of environments.
At the root sits the global environment. This is what you can see in the environment pane in RStudio.
Above that, R can invoke any number of nested environments. One important type of environments are package environments. Whenever you load a package, R creates an environment for that package, in which all its functions and other objects are defined.*
*R actually creates two environments, a package environment, and a namespace environment, see more here
Whenever you execute a function, R creates a temporary environment in which the function is evaluated. When the function is complete, the environment gets destroyed.
Only the output of the function is printed or assigned to an object, if you specify this.
Error in eval(expr, envir, enclos): object 'x' not found
Given what we know about environments, what is the outcome of the following code?
Scoping is the act of finding the value associated with a name.
For functions, the important thing to remember is that within a function, R will look first within the function execution environment, and then in its parent environments.
[1] 20
This behavior is great because it allows more computationally efficient functions. It also allows funky default arguments (but maybe not a great idea do this..)
[1] 8
When defining your own function, it’s important you keep an eye on whether you actually use the arguments you’ve defined in the function body. R won’t tell you…
This is often useful when a function calls another function (called “forwarding”), or has a function as one of its arguments. The ...
can be used to allow users to pass on arguments to the other function:
[1] 5
Define a function that calculates the standard error for a vector of numbers.
Bonus 1: make it so you can specify na.rm = TRUE
Bonus 2: make it so that it gives some informative errors when given wrong input (e.g. a character value). Tip: check out the stopifnot()
function, or use an if statement.
Non-standard evaluation (also called tidy evaluation) is a tricky concept, and to be honest, I haven’t understood it fully myself.
For most things you’re likely to do with R, you probably won’t need a deep understanding of it. However, if you want to program with tidyverse functions, you will need to understand a few tricks.
To learn more, Hadley Wickham’s book Advanced R gives a good introduction
How does select()
know what you are talking about?
Tidyverse functions blur the boundary between environment variables (variables that exist within the global or function environment) and data variables (variables that exist as named columns in a dataframe).
In base R, you would need to quote column names to select them.
Tidyverse code works by combining code expressions (capturing the intent of a piece of code without evaluating it), with a data mask (a special object that basically turn data columns into objects).
It does this to make working with these functions more convenient for you - so that you don’t need to use ""
all the time.
This however comes at a cost when you define your own functions that use tidyverse functions.
grouped_mean()
functionThis does not work because R will try to evaluate grouping_var
and values_var
as data variables, but won’t find them in the dataset.
grouped_mean()
functionWe need to tell R that we want use grouping_var
and values_var
as hybrid variables. It should first treat them as environment variables, and make them refer to whatever you assign them to. Then, they should be treated as data variables.
You can do that easily with the {{}}
operator (pronounce “curly curly”, or the embracing operator)
Using curly curly on the LHS of an assignment requires a special operator.
This code uses glue()
syntax to specify a new name for the new column. It also uses the walrus operator :=
for assignment. The walrus operator is necessary whenever you use {{}}
on the left hand side of an assignment.
create_scatter_plot <- function(data, variable1, variable2) {
data %>%
ggplot(aes(variable1, variable2)) +
geom_point() +
theme_light()
}
# hello darkness, my old friend
create_scatter_plot(mtcars, mpg, cyl)
Error in `geom_point()`:
! Problem while computing aesthetics.
ℹ Error occurred in the 1st layer.
Caused by error:
! object 'cyl' not found
Bonus: make it so that bars are sorted by size (tip: use a function from the fct_ family)
Tip: this requires treating the name of the objects as a string. Perhaps a quick google search can help you.