Show Geoms in a Graph’s Legend Using ggplot2

R
Data-visualisation
Author

Filip Reierson

Published

December 15, 2024

To colour or style data in ggplot you can define an aesthetics mapping (aes) or add the styles as an argument to the geom layer. For example, geom_point(aes(colour=var1)) or geom_point(colour='red'). However, in the latter there is no legend entry. If you would like a legend entry, include the desired label in the aes mapping, e.g., geom_point(aes(colour='legend label')). To specify the colour you must add scale_colour_manual(values=c('legend label'='desired colour')), where 'desired colour' may be replaced by something like 'red'. I show an example of this approach in Figure 1. I find it particularly useful with multiple uses of geom_smooth.

library(tidyverse)
anscombe <- datasets::anscombe |>
  rowwise() |>
  mutate(
    'Dataset 1' = list(c('x' = x1, 'y' = y1)),
    'Dataset 2' = list(c('x' = x2, 'y' = y2)),
    'Dataset 3' = list(c('x' = x3, 'y' = y3)),
    'Dataset 4' = list(c('x' = x4, 'y' = y4))
  ) |>
  select(`Dataset 1`:`Dataset 4`) |>
  pivot_longer(everything()) |>
  unnest_wider(value)
ggplot(anscombe, aes(x, y)) +
  facet_wrap(~ name) +
  geom_point() +
  geom_smooth(
    aes(colour = 'Linear model'),
    se = F,
    method = 'lm',
    formula = 'y~x'
  ) +
  geom_smooth(
    aes(colour = 'Quadratic model'),
    se = F,
    method = 'lm',
    formula = 'y~poly(x,2)',
    data = filter(anscombe, name == 'Dataset 2')
  ) +
  geom_smooth(
    aes(colour = 'Linear model (no outlier)'),
    se = F,
    method = 'lm',
    formula = 'y~x',
    data = filter(anscombe, name == 'Dataset 3', y < 10)
  ) +
  scale_colour_manual(
    values = c(
      'Linear model' = 'red',
      'Linear model (no outlier)' = 'blue',
      'Quadratic model' = 'darkgreen'
    )
  ) +
  theme_bw() +
  labs(colour = '') +
  theme(legend.position = 'top',
        strip.background = element_blank())
Figure 1: Datasets with identical, or near identical, mean, sample variance, and linear regression lines, published by Anscombe (1973). Alternative regression models are presented for the second and third dataset.
Dataset: Anscombe’s quartet

The “Anscombe’s quartet” dataset, shared in Anscombe (1973), can be used to illustrate how summary statistics can miss the bigger picture. While the graphs all look very different, the statistics in Table 1 are almost identical.

anscombe |>
  group_by(name) |>
  summarise(mean_x = mean(x), 
            sd_x = sd(x),
            mean_y = mean(y),
            sd_y = sd(y),
            cor = cor(x,y),
            lm_intercept = coef(lm(y~x))[[1]],
            lm_slope = coef(lm(y~x))[[2]],
            lm_Rsq = summary(lm(y~x))$r.squared
            ) |>
  knitr::kable(digits = 3, col.names = c(
    c('', 'mean','sd','mean','sd','correlation','intercept','slope','R-squared')
  )) |>
  kableExtra::add_header_above(header = list(' ', 'x'=2,
                                          'y'=2, ' ', 'linear model'=3))
Table 1: Summary statistics for the four plots in Anscombe (1973).
x
y
linear model
mean sd mean sd correlation intercept slope R-squared
Dataset 1 9 3.317 7.501 2.032 0.816 3.000 0.5 0.667
Dataset 2 9 3.317 7.501 2.032 0.816 3.001 0.5 0.666
Dataset 3 9 3.317 7.500 2.030 0.816 3.002 0.5 0.666
Dataset 4 9 3.317 7.501 2.031 0.817 3.002 0.5 0.667

In ggplot2, there are multiple ways to apply aesthetics. I have summarised different approaches to adding colour to a geom in Table 2, along with their advantages and disadvantages. The same principles applies if “colour” was replaced with “linetype”. With the first two approaches, the legend entry can be controlled by setting show.legend inside the geom. It is important to note a distinction between the second and third approaches. While the geom_line(colour = 'red') directly sets the line colour to red, geom_line(aes(colour = 'red')) maps the value 'red' to the default colour scale, resulting in the default colour and a legend entry labelled “red”. To explicitly set the colour when using aes, you must specify a colour scale, such as scale_colour_manual as shown earlier in Figure 1.

Table 2: Comparison of different approaches for adding colour to a geom. Note that geom_line can be replaced by any appropriate geom such as geom_smooth or geom_point.
Approach Legend Entry Requires Color Scale to Customize
geom_line(aes(colour = var1)) Yes Yes
geom_line(aes(colour = 'label')) Yes Yes
geom_line(colour = 'desired color') No No

I have found that the colour mapping system in ggplot2 is flexible, but has a steep learning curve. I hope that this article has shed some light on the ways aesthetics can be added to specific geoms.

References

Anscombe, F. J. 1973. “Graphs in Statistical Analysis.” The American Statistician 27 (1): 17–21. https://doi.org/10.1080/00031305.1973.10478966.