library(tidyverse)
<- datasets::anscombe |>
anscombe rowwise() |>
mutate(
'Dataset 1' = list(c('x' = x1, 'y' = y1)),
'Dataset 2' = list(c('x' = x2, 'y' = y2)),
'Dataset 3' = list(c('x' = x3, 'y' = y3)),
'Dataset 4' = list(c('x' = x4, 'y' = y4))
|>
) select(`Dataset 1`:`Dataset 4`) |>
pivot_longer(everything()) |>
unnest_wider(value)
To colour or style data in ggplot you can define an aesthetics mapping (aes
) or add the styles as an argument to the geom layer. For example, geom_point(aes(colour=var1))
or geom_point(colour='red')
. However, in the latter there is no legend entry. If you would like a legend entry, include the desired label in the aes
mapping, e.g., geom_point(aes(colour='legend label'))
. To specify the colour you must add scale_colour_manual(values=c('legend label'='desired colour'))
, where 'desired colour'
may be replaced by something like 'red'
. I show an example of this approach in Figure 1. I find it particularly useful with multiple uses of geom_smooth
.
ggplot(anscombe, aes(x, y)) +
facet_wrap(~ name) +
geom_point() +
geom_smooth(
aes(colour = 'Linear model'),
se = F,
method = 'lm',
formula = 'y~x'
+
) geom_smooth(
aes(colour = 'Quadratic model'),
se = F,
method = 'lm',
formula = 'y~poly(x,2)',
data = filter(anscombe, name == 'Dataset 2')
+
) geom_smooth(
aes(colour = 'Linear model (no outlier)'),
se = F,
method = 'lm',
formula = 'y~x',
data = filter(anscombe, name == 'Dataset 3', y < 10)
+
) scale_colour_manual(
values = c(
'Linear model' = 'red',
'Linear model (no outlier)' = 'blue',
'Quadratic model' = 'darkgreen'
)+
) theme_bw() +
labs(colour = '') +
theme(legend.position = 'top',
strip.background = element_blank())
The “Anscombe’s quartet” dataset, shared in Anscombe (1973), can be used to illustrate how summary statistics can miss the bigger picture. While the graphs all look very different, the statistics in Table 1 are almost identical.
|>
anscombe group_by(name) |>
summarise(mean_x = mean(x),
sd_x = sd(x),
mean_y = mean(y),
sd_y = sd(y),
cor = cor(x,y),
lm_intercept = coef(lm(y~x))[[1]],
lm_slope = coef(lm(y~x))[[2]],
lm_Rsq = summary(lm(y~x))$r.squared
|>
) ::kable(digits = 3, col.names = c(
knitrc('', 'mean','sd','mean','sd','correlation','intercept','slope','R-squared')
|>
)) ::add_header_above(header = list(' ', 'x'=2,
kableExtra'y'=2, ' ', 'linear model'=3))
mean | sd | mean | sd | correlation | intercept | slope | R-squared | |
---|---|---|---|---|---|---|---|---|
Dataset 1 | 9 | 3.317 | 7.501 | 2.032 | 0.816 | 3.000 | 0.5 | 0.667 |
Dataset 2 | 9 | 3.317 | 7.501 | 2.032 | 0.816 | 3.001 | 0.5 | 0.666 |
Dataset 3 | 9 | 3.317 | 7.500 | 2.030 | 0.816 | 3.002 | 0.5 | 0.666 |
Dataset 4 | 9 | 3.317 | 7.501 | 2.031 | 0.817 | 3.002 | 0.5 | 0.667 |
In ggplot2
, there are multiple ways to apply aesthetics. I have summarised different approaches to adding colour to a geom in Table 2, along with their advantages and disadvantages. The same principles applies if “colour” was replaced with “linetype”. With the first two approaches, the legend entry can be controlled by setting show.legend
inside the geom. It is important to note a distinction between the second and third approaches. While the geom_line(colour = 'red')
directly sets the line colour to red, geom_line(aes(colour = 'red'))
maps the value 'red'
to the default colour scale, resulting in the default colour and a legend entry labelled “red”. To explicitly set the colour when using aes
, you must specify a colour scale, such as scale_colour_manual
as shown earlier in Figure 1.
geom_line
can be replaced by any appropriate geom such as geom_smooth
or geom_point
.
Approach | Legend Entry | Requires Color Scale to Customize |
---|---|---|
geom_line(aes(colour = var1)) |
Yes | Yes |
geom_line(aes(colour = 'label')) |
Yes | Yes |
geom_line(colour = 'desired color') |
No | No |
I have found that the colour mapping system in ggplot2
is flexible, but has a steep learning curve. I hope that this article has shed some light on the ways aesthetics can be added to specific geoms.