01_tidyverse_recap_practice_solution

Solution: tidyverse recap practice

Load packages

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.2     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Get the data

chocolate <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-01-18/chocolate.csv')
Rows: 2530 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): company_manufacturer, company_location, country_of_bean_origin, spe...
dbl (3): ref, review_date, rating

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Make data tidy

First make a wide dataset

chocolate_wide <- chocolate %>% 
  # separating ingredients and characteristics out wider
  separate_wider_delim(
    cols = c(ingredients, most_memorable_characteristics),
    delim = ",",
    names_sep = "_",
    too_few = "align_start"
  ) %>% 
  
  # repair first ingredient column
  separate_wider_delim(
    cols = ingredients_1,
    delim = "-",
    names = c("num_ingredients", "ingredients_1")
  ) %>% 
  
  # make num_ingredients numeric
  mutate(num_ingredients = as.numeric(num_ingredients))

Then we can pivot_wider()… for instance the ingredients columns

chocolate_ingredients_long <- chocolate_wide %>%
  pivot_longer(
    cols = starts_with("ingredients"),
    names_to = "ingredient_number",
    values_to = "ingredient",
    values_drop_na = TRUE
  ) %>% 
  
  # fix ingredient number
  mutate(ingredient_number = parse_number(ingredient_number),
         # remove starting / trailing whitespace from ingredients
         ingredient = stringr::str_squish(ingredient))

And then we can plot… For instance the mean rating for each ingredient :)

chocolate_ingredients_long %>% 
  ggplot(aes(ingredient, rating)) +
  stat_summary(geom = "pointrange")
No summary function supplied, defaulting to `mean_se()`

chocolate_ingredients_long %>% 
  ggplot(aes(num_ingredients, rating)) +
  stat_summary(geom = "pointrange")
No summary function supplied, defaulting to `mean_se()`