Good Code

What is good code?

[10 minute discussion]

Automatic formatting

Use automatic formatting to do some automatic cleaning of your code

  • For mac: Cmd + Shift + A
  • For windows: Ctrl + Shift + A

Helps with many things, but not a magic bullet..

# before automatic formatting
a_random_function <- function(something, something_else){result <- do_something(something) %>% 
  do_something_new(something_else)}

# after automatic formatting
a_random_function <-
  function(something, something_else) {
    result <- do_something(something) %>%
      do_something_new(something_else)
  }

The {styler} package

Tip

  • You can automatically style code and entire scripts using the {styler} package.

  • This is especially handy for using .rmd or .qmd files. Here, you can simply specify the tidier as a knitr chunk option.

knitr::opts_chunk$set(tidy = "styler")

Note that this doesn’t affect the source document (i.e. the script), but only affects the knitted document.

an_unstylized_example <- "asdf"

Naming conventions

Consistency is key!

Use one system and stick to it. This will help not only with readibility, but also writing code to for example select key variables of interest.

Compare:

my_data %>%
  select(
    "item_1",
    "Item_2",
    "This_is_the_3_item",
    "Yet_another_item",
    "What.is.this.item?"
  )
my_data %>%
  select(starts_with("scale_name"))

My best practices for naming stuff

I like the following:

  • small case for all variables
  • snake_case
  • nouns for variables and datasets, verbs for functions

Tip: the clean_names() function from the {janitor} package

data_with_bad_names <- tibble(
  BAD_NAME = 1,
  `really bad name` = 2,
  `1 - another bad (name)` = 3
)

data_with_bad_names %>%
  janitor::clean_names() %>%
  names()
[1] "bad_name"            "really_bad_name"     "x1_another_bad_name"

# also works with other cases, if you prefer those
# (see ?sankecase::to_any_case)
data_with_bad_names %>%
  janitor::clean_names(case = "upper_camel") %>%
  names()
[1] "BadName"          "ReallyBadName"    "X1AnotherBadName"

Commenting code

Code commenting practices

  • more is not always better
  • general advice: comments shouldn’t focus on the how, but the why
# load in data
df <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-01-17/artists.csv")
# load tidytuesday dataset on artists
# see documentation: https://github.com/rfordatascience/tidytuesday/blob/master/data/2023/2023-01-17/readme.md
df <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-01-18/chocolate.csv")

Code commenting exercise (~15 min?)

Go back to an old script (e.g. for data cleaning, …) of yours (preferably older than 6 months) and take a look at it

  • What have you commented, what haven’t you commented?
  • Which comments make sense to you? Which don’t?
  • Show the code to your neighbor without explaining it. What can they understand, what don’t they understand?

Script etiquette

Workflow

General logic: all data cleaning / manipulation first, then analyses

For more complex projects, keep data cleaning and analysis in separate scripts.

Load all necessary packages at the start of the script

  • makes it easier to understand which packages are needed

  • if you only need a function from a specific function once, do not necessary load in the package, you can also just call a function with reference to its namespace using the :: notation

MASS::bcv()

The {conflicted} package

Dealing with name conflicts

The {conflicted} package helps navigate name conflicts of functions from different packages (functions having the same name, such as base::filter() and dplyr::filter())

The conflicted::conflict_prefer() function lets you set defaults for which function you prefer in this case.

If you deal with packages that have naming conflicts, loading the {conflicted} package at the start of your R scripts is a good idea.