Consider Image-Based Waffle Charts for Your Next Data Story

Author

Filip Reierson

Published

January 26, 2025

How this was made
show_code = "hidden"
To learn more about how the output in this article was created you can choose to show the code I used. Currently the code in this article is  
.

In this article I share how to create an image based waffle chart. With some creativity I believe these types of waffle charts can be imactful, while retaining the simplicity of waffle charts. Typically waffle charts will either be made up of tiles or repeating symbols or images. However, the flavour of waffle chart I present in this article is made by cropping images to be in the desired proportion and superimposing a grid.

To demonstrate my approach I use data about electricity production in Australia. The proportion of Australian electricity generated by each energy source is tabulated in .

Code
library(tidyverse)
library(patchwork)
library(magick)
d <- read_csv('data.csv', show_col_types = FALSE)
waffle_df <- d |>
  drop_na() |>
  filter(!str_detect(Fuel,'Total')) |>
  mutate(category = case_when(
    Fuel == "Black coal" ~ 'black-coal',
    Fuel == "Brown coal" ~ "brown-coal",
    Fuel == "Natural gas" ~ "natural-gas",
    Fuel %in% c("Large-scale solar PV", "Small-scale solar PV") ~ "solar",
    Fuel == "Hydro" ~ "hydro",
    Fuel == "Wind" ~ 'wind',
    TRUE ~ "other"
  )) |>
  group_by(category) |>
  summarise(tot = sum(Quantity)) |>
  mutate(prop = tot/sum(tot) * 100) |>
  arrange(desc(prop)) |>
  mutate(cumulative = cumsum(prop)) |>
  mutate(start = cumulative - prop) |>
  rowid_to_column() |>
  mutate(category_upper = str_to_title(str_replace(category, '-', ' ')),
         category_label = paste0(category_upper, ' (', scales::label_percent(.1)(prop/100),')'))
Code
waffle_df |>
  select(category_upper, tot, prop) |>
  mutate(prop = scales::label_percent(.1)(prop/100),
         tot = scales::label_number(1)(tot)) |>
  knitr::kable(col.names = c('Category', 'Energy (GWh)', 'Proportion'),
               align = 'lrr')
Table 1: Breakdown of electricity production in Australia, from DCCEEW (2024) Australian Energy Statistics, Table O.
Category Energy (GWh) Proportion
Black Coal 96 174 35.0%
Natural Gas 48 865 17.8%
Solar 41 969 15.3%
Brown Coal 31 459 11.5%
Wind 31 385 11.4%
Hydro 16 666 6.1%
Other 7 957 2.9%

There are various libraries that make it easier to create standard waffle charts, but I am not aware of an approach that quite gets the look I am going for. In this article I propose an approach that uses ggplot2, patchwork, and magick.

Code
get_xy <- function(image) {
  matrix_data <- image |>
    as.raster() |>
    as.matrix()
  row_indices <- row(matrix_data)
  col_indices <- col(matrix_data)
  df <- data.frame(
    x = as.vector(col_indices),
    y = max(row_indices)-as.vector(row_indices),
    z = as.vector(matrix_data)
  )
  df
}
apply_target <- function(image,target_width) {
  image_info <- image_info(image)
  image_width <- image_info$width
  num_repeats <- ceiling(target_width / image_width)
  repeated_images <- image_append(rep(image,num_repeats))
  final_image <- image_crop(repeated_images, geometry = paste0(target_width, "x"))
  final_image
}
waffle_img <- waffle_df |>
  mutate(image_name = paste0(category,'.png')) |>
  mutate(target_width = round(prop/100*1000*10)) |>
  mutate(pixel_start = cumsum(target_width) - target_width) |>
  rowwise() |>
  mutate(image = list(image_read(image_name) |>
           image_resize(geometry = "x100") |>
           apply_target(target_width) |>
           get_xy())) |>
  unnest(image) |>
  mutate(x=x + pixel_start) |>
  mutate(y=y/100-0.495,
         x=x/100) 

I identified suitable images and cropped sections with usable patterns. I modified the images using magick such that their height is 100 pixels. Next I repeated the images and cropped them to the desired widths. For example, if the desired width is 700 pixels for each, then the images would look like in . However, for the waffle plot the desired width is determined by the data.

Code
waffle_df |>
  mutate(image_name = paste0(category, '.png')) |>
  mutate(target_width = round(prop / 100 * 400 * 10)) |>
  mutate(pixel_start = cumsum(target_width) - target_width) |>
  rowwise() |>
  mutate(image = list(
    image_read(image_name) |>
      image_resize(geometry = "x100") |>
      apply_target(700) |>
      get_xy()
  )) |>
  unnest(image) |>
  arrange(prop) |>
  (\(x) {
    ggplot(x,aes(
      y = as.numeric(fct_inorder(category_upper)) * 100 + y,
      x = x,
      fill = z
    )) +
      geom_raster() +
      geom_label(
        aes(
          y = as.numeric(fct_inorder(category_upper)) * 100 + 50,
          x = 350,
          label = category_upper
        ),
        data = waffle_df |> arrange(prop),
        fontface = 'bold',
        label.size = NA,
        label.r = unit(0.5, 'mm'),
        color = 'white',
        fill = alpha('black', 0.5)
      ) +
      scale_fill_identity() +
      theme_void() +
      labs(fill = '') +
      theme(legend.position = 'none') +
      coord_equal()
  })()
Figure 1: Some images of patterns that I use to represent the different energy sources. The images were created by cropping pictures found on Wikimedia Commons.

After making the sizes correspond to their contribution to electricity generation, I arrange the images horizontally on a single axis. Next I create 10 variants of the plot which show the x-axis area between 0-10%, 10-20%, …, 90-100% using coord_equal with different xlim parameters. These plots are then stacked on top of each other using patchwork. Finally, I place a very small plot under all the others that I use to label on top of the waffle chart. The reason for this is so that I can set clip='off'. If we set clip='off' for any of the other plots then the xlim trick would fail. The resulting waffle chart is shown in .

Code
p <- waffle_img |>
  ggplot(aes(x, y, fill = z)) +
  geom_raster() +
  scale_fill_identity() +
  theme_void() +
  labs(fill = '') +
  theme(legend.position = 'none')
get_waffle <- function(p,text_between=c(0,100)) {
  p_combine <- list()
  for (i in 0:9) {
    start_pos <- i * 10
    p1 <- p +
      coord_equal(
        xlim = c(start_pos, start_pos + 10),
        ylim = c(-0.5, 0.5),
        expand = F
      ) +
      geom_vline(
        xintercept = (start_pos + 1):(start_pos + 9),
        color = 'white',
        lwd = 0.5
      ) +
      theme(plot.background = element_rect(fill='white',color='white',
                                           linewidth = 0),
            plot.margin=margin(t=0.5))
    # if (i == 0)
    #   p1 <- p1 + theme(plot.margin = margin(t = 12))
    p_combine[i+1] <- list(p1)
  }
  reduce(p_combine, `+`) + plot_layout(ncol = 1)
}

# to account for margins I have to adjust position slightly
fix_pos <- \(y) {return(y*1.005)} 

get_waffle(p) +
  (ggplot() +
  geom_label(
    aes(y = fix_pos(10-floor(start/10)), x = start%%10, label = category_label),
    vjust = 0,
    angle = 0,
    hjust = 0,
    nudge_x = 0.1,
    nudge_y = -0.4,
    size = 3.5,
    fontface = 'bold',
    label.size = NA,
    label.r = unit(0.5, 'mm'),
    color='white',
    fill = alpha('black', 0.5),
    data = waffle_df
  ) +
  scale_x_continuous(breaks=1:10) +
  coord_equal(
        xlim = c(0,10),
        ylim = c(-0.5/10, 0.5/10),
        expand = F,
        clip = 'off'
      ) +
  theme_void()) +
  plot_layout(heights = c(rep(10,10),1),
              widths = 15) +
  plot_annotation(title='Australian Electricity Energy Sources',
                  theme=theme(plot.title = element_text(hjust=0.5)))
Figure 2: The proportion of electricity produced by different energy sources as reported by DCCEEW (2024) Australian Energy Statistics, Table O.