The Tidyverse - A Recap

Day 2

Questions before we begin?

Data cleaning & manipulation in the tidyverse

Basics - mutate, summarise, group_by, pivot_longer, pivot_wider, …
Advanced concepts - across, rowwise, tidyselect methods, …
Text data - Regular expressions, {Stringr}, {Glue}
Practice :)

Recap of most important tidyverse syntax and functions

Bread and butter functions

mutate()
group_by()
summarise()
filter()
select()

New as of dplyr 1.1.2

Since dplyr 1.1.2, you can use the “.by” argument to use group_by() within selected dplyr verbs

mtcars %>% 
  mutate(avg_mpg = mean(mpg), .by = cyl) %>% 
  head()

Also works with summarise(.by = ), reframe(.by = ), filter(.by = ), slice(.by = )

Reshaping

pivot_wider() (formerly: spread())
pivot_longer() (formerly: gather())

Joining

left_join()
right_join()
inner_join()
full_join()

The “Pipe”

%>%()
|>()

The_pipe %>% 
  allows_us() %>% 
  to_string_together() %>% 
  multiple_functions()

# this is the same as
multiple_functions(to_string_together(allows_us(The_pipe)))

Practice

Let’s play

Explore the chocolate dataset from tidytuesday, see here for documentation

chocolate <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-01-18/chocolate.csv')

Use group_by and summarise functions to explore mean ratings for different countries, beans, years…
use the pipe!
Plot the data in different ways

Bonus:

Do some data cleaning: the ingredients and characteristics column encode several observations. Separate out these columns, and make sure data is in long format :)