Exploring Coffee Flavour Notes in The Great American Taste Test

R
Table
Analysis
TidyTuesday
Author

Filip Reierson

Published

June 13, 2024

James Hoffman, a coffee expert and YouTuber ran a blind coffee taste test in October of 2023 that involved participants receiving coffee taste kits. See James Hoffman’s live stream or the analysis to learn more about the coffees themselves. Existing analyses focus on people’s preferences, behaviour, and accuracy of their flavour notes. However, in this article I share a table that shows how flavour notes were used by participants. The table was made by myself and Janith Wanniarachchi at the NUMBATs hackathon, with the code required to reproduce it available at freierson/coffee-table. After explaining and showing the table in this article I will provide an explanation of the steps involved in its creation.

In the survey, respondents were asked to record flavour notes and then rate each coffee on three attributes: Sentiment, Bitterness, and Acidity, each on a scale from 1 to 5. Columns “Sentiment”, “Bitterness”, and “Acidity” show how the presence of specific flavour notes impacts these ratings. Each number in these columns represents the difference in the average rating for a coffee when a particular flavour note was mentioned compared to when it wasn’t. Positive numbers indicate a higher rating with the flavour note, while negative numbers indicate a lower rating.

The “Complementary note” column lists words that are likely to appear when the specified flavour note is mentioned. These words are identified using a technique inspired by TF-IDF (Term Frequency-Inverse Document Frequency), which highlights words that frequently occur together, but controls for overall word frequency.

The “Best describes” column indicates which of the four coffees in the taste test was most commonly associated with each flavour note. The next four columns show the proportion of times each flavour note was mentioned for each of the four coffees. This helps identify which coffee is most strongly associated with specific flavour notes.

Print version of the table

How we made this table

To reproduce the table, clone the repository freierson/coffee-table and build the table by running quarto render in the root directory using the terminal.

Contributions

I performed the data preparation (data_prep.qmd) and wrote this article. Janith Wanniarachchi added interactivity and styled the table (table_submission.qmd). We discussed ideas and collaborated throughout the project.

An explanation of key steps

We begin by reading the data as shared in TidyTuesday. The only difference from the raw data is that the column names are cleaned up a bit, which makes it easier to work with in R.

Code
library(tidyverse)
coffee <- readr::read_csv(file = 'coffee_survey.csv', show_col_types = F) |>
  arrange(submission_id) |>
  rowid_to_column('submission_number')

In our data preparation process, we began by sampling flavour notes for each coffee and used ChatGPT to generate ideas for notes to highlight. To ensure that our final table is reproducible, we developed a series of regular expressions that match on the use of a flavour note or essentially equivalent notes. These regular expressions allow for consistent categorisation of the survey responses. Additionally, the list of flavour notes can easily be modified by others by adding new entries to this list. The following are the regular expressions used for our table.

Code
patterns_classifier <- list(
  Fruity = "(?i)fruit", 
  Honey = "(?i)honey", 
  Apple = "(?i)apple",
  Chocolate = "(?i)chocolat",
  Citrus = "(?i)citrus",
  Sour = "(?i)sour",
  Nutty = "(?i)nut",
  Smooth = "(?i)smooth",
  Bright = "(?i)bright",
  Smoky = '(?i)(smoky|smoke)',
  Balanced = '(?i)balanced',
  Caramel = '(?i)caramel',
  Earthy = '(?i)earth',
  Sweet = '(?i)sweet',
  Cherry = '(?i)(cherry|cherries)',
  Berry = '(?i)(berry|berries)',
  Floral = '(?i)(floral|flower)',
  Fermented = '(?i)ferment',
  Complex = '(?i)(complex|complicated)',
  Juicy = '(?i)juic(e|y)',
  Bitter = '(?i)bitter'
)

Each survey responder provided flavour notes for four different coffees labelled A through D. Using the predefined regular expressions, we systematically checked each coffee’s notes for the presence of specific flavour descriptors.

Code
for(note in names(patterns_classifier)) {
  coffee[paste0('note_a_', note)] <- str_detect(coffee$coffee_a_notes, pattern = patterns_classifier[[note]])
  coffee[paste0('note_b_', note)] <- str_detect(coffee$coffee_b_notes, pattern = patterns_classifier[[note]])
  coffee[paste0('note_c_', note)] <- str_detect(coffee$coffee_c_notes, pattern = patterns_classifier[[note]])
  coffee[paste0('note_d_', note)] <- str_detect(coffee$coffee_d_notes, pattern = patterns_classifier[[note]])
}

Next we pivot the columns we just created into a long format. This long format is particularly suitable when working with dplyr.

Code
notes <- coffee |>
  drop_na(dplyr::starts_with('note')) |>
  pivot_longer(dplyr::starts_with('note'), names_to = 'note', values_to = 'was_mentioned') |>
  mutate(coffee = substr(note,6,6),
         note = substr(note, 8, 1000)) |>
  relocate(coffee, note) |>
  filter(was_mentioned)

We computed the distribution of flavour notes mentions across the four coffees.

Code
prop_table <- notes |>
  count(note, coffee) |>
  group_by(note) |>
  mutate(prop = n / sum(n),
         N = sum(n)) |>
  ungroup() |>
  select(-n) |>
  pivot_wider(values_from = prop, names_from = coffee)

We computed the coffee people were usually describing when each flavour note was used. We used a test of proportions to determine when we should show multiple coffees as most common.

Code
sample_size <- coffee |> drop_na(dplyr::starts_with('note')) |> nrow()
most_often <- expand_grid(
  note = notes |>
    distinct(note) |>
    pull(note),
  l = letters[1:4],
  r = letters[1:4]
) |>
  filter(l != r) |>
  left_join(notes |>
  count(note, coffee), 
    by = join_by(note,l==coffee)) |>
  left_join(notes |>
  count(note, coffee), 
    by = join_by(note,r==coffee)) |>
  group_by(note) |>
  slice_max(n.x) |>
  slice_max(n.y) |>
  mutate(stats::prop.test(c(n.x,n.y),c(sample_size,sample_size)) |> broom::tidy()) |>
  mutate(most_often = ifelse(p.value <= 0.05, l, paste0(l,', ',r))) |>
  mutate(p.value=round(p.value,4)) |>
  select(note, p.value, most_often) |>
  ungroup()

To identify flavour notes that commonly appear together, we employed a method inspired by TF-IDF. First, we calculated the document frequency for each note within each coffee, representing the number of times each note appeared in a specific coffee. Then, we identified pairs of notes that were mentioned together by matching each note in a survey response with all other notes in the same survey response. We filtered out self-pairs (where the note was paired with itself) and counted the co-occurrences of different notes within the same survey response.

Next, we calculated the TF-IDF score for each note pair. The term frequency was computed as the number of times the pair appeared together divided by the document frequency of the second note in the pair. We averaged these TF-IDF scores to obtain a mean score for each note pair across all coffees. By focusing on the highest TF-IDF score for each note, we identified the flavour notes in our table that were most commonly mentioned together, highlighting potentially interesting associations between flavour notes.

Code
document_freq <- notes |>
  count(note, coffee, name = 'doc_freq')
mentioned_with <- inner_join(notes |>
             select(submission_number, coffee, note),
           notes |>
             select(submission_number, coffee, note),
           by=join_by(submission_number, coffee), relationship = "many-to-many") |>
  filter(note.x != note.y) |>
  arrange(submission_number, coffee) |>
  count(note.x, note.y, coffee) |>
  left_join(document_freq, by = join_by(note.y==note, coffee)) |>
  group_by(note.x,note.y) |>
  summarise(tfidf = mean(n / doc_freq), .groups='drop') |>
  group_by(note.x) |>
  slice_max(tfidf) |>
  rename(note = note.x, mentioned_with = note.y) |>
  select(note, mentioned_with, tfidf) |>
  ungroup()

Next we analysed the relationship between flavour notes and coffee characteristics (bitterness, acidity, and personal preference), which involved a few steps.

We calculated the mean values for each variable across all survey responses for each coffee. As is often the case this involved pivoting to a long format before pivoting back to a wide format.

Code
overall_means <- coffee |>
  select(submission_number, contains('bitterness'), contains('acidity'), contains('preference')) |>
  pivot_longer(-1) |>
  mutate(coffee = substr(name,8,8),
         variable = substr(name, 10, 1000)) |>
  select(-name) |>
  group_by(coffee,variable) |>
  summarise(mean = mean(value, na.rm=T), .groups='drop') |>
  pivot_wider(names_from = variable, values_from = mean) |>
  rename(sentiment = personal_preference)

We computed the average characteristic ratings for each coffee when a specific note is mentioned, and then subtract the overall mean rating for that coffee to isolate the characteristic rating attributable to the use of that particular flavour note. Finally, to keep the final table a bit simpler we take the average across all coffees for each flavour note so that there is just one number per flavour note and characteristic. As part of this process we also renamed personal_preference as sentiment.

Code
note_associations <- notes |>
  mutate(
    bitterness = case_when(
      coffee == 'a' ~ coffee_a_bitterness,
      coffee == 'b' ~ coffee_b_bitterness,
      coffee == 'c' ~ coffee_c_bitterness,
      coffee == 'd' ~ coffee_d_bitterness
    ),
    acidity = case_when(
      coffee == 'a' ~ coffee_a_acidity,
      coffee == 'b' ~ coffee_b_acidity,
      coffee == 'c' ~ coffee_c_acidity,
      coffee == 'd' ~ coffee_d_acidity
    ),
    sentiment = case_when(
      coffee == 'a' ~ coffee_a_personal_preference,
      coffee == 'b' ~ coffee_b_personal_preference,
      coffee == 'c' ~ coffee_c_personal_preference,
      coffee == 'd' ~ coffee_d_personal_preference
    )
  ) |>
  select(submission_number, coffee, note, bitterness, acidity, sentiment) |>
  inner_join(overall_means, by=join_by(coffee), suffix = c('', '_mean')) |>
  group_by(note) |>
  summarise(
    bitterness_n = sum(!is.na(bitterness)),
    acidity_n = sum(!is.na(acidity)),
    score_n = sum(!is.na(sentiment)),
    bitterness = mean(bitterness-bitterness_mean, na.rm = T),
    acidity = mean(acidity-acidity_mean, na.rm = T),
    sentiment = mean(sentiment-sentiment_mean, na.rm = T),
    .groups='drop'
  )

Finally, we join the data frames created in previous steps to create a table that is ready for some styling.

Code
table_data <- prop_table |>
  left_join(most_often, by = join_by(note)) |>
  left_join(mentioned_with, by = join_by(note)) |>
  left_join(note_associations, by = join_by(note)) |>
  left_join(avg_expertise, by = join_by(note)) |>
  select(-ends_with('_n')) |>
  mutate(
    most_often1 = case_when(
      substr(most_often,1,1)=='a' ~ 'Kenyan coffee with a light roast',
      substr(most_often,1,1)=='b' ~ 'blend with a medium roast',
      substr(most_often,1,1)=='c' ~ 'blend with a dark roast',
      substr(most_often,1,1)=='d' ~ 'Colombian coffee',
    ),
    most_often2 = case_when(
      substr(most_often,4,4)=='a' ~ ' or the Kenyan coffee with a light roast',
      substr(most_often,4,4)=='b' ~ ' or the blend with a medium roast',
      substr(most_often,4,4)=='c' ~ ' or the blend with a dark roast',
      substr(most_often,4,4)=='d' ~ ' or the Colombian coffee',
      T ~ ''
    ),
    description = glue::glue('
    {note} was used as a flavour note {prettyNum(N,big.mark=",",scientific=FALSE)} times. This note was used when the coffee was experienced as {ifelse(bitterness<0,"less","more")} bitter, {ifelse(acidity<0,"less","more")} acidic, and {ifelse(sentiment<0,"less","more")} enjoyable. The flavour note was commonly mentioned along with "{mentioned_with}". Among the coffees in the kit the note was most commonly used to describe the {most_often1}{most_often2}.')
  )
table_data |>
  write_csv("table_data.csv")

Finally, the table is rendered using the reactable package in R. We used colDef to add tooltips to columns. We used the details argument of reactable to contain a card with a description of the row of data, along with an audio player that reads the text if desired. For the symbols in the table we used a mix of emojis, Unicode, and svg/png.