-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
Currently, axis and legend titles default to the column names of the data.frame. There are several packages that provide functions to attach more comprehensive variable labels to data.frame columns (e.g., Hmsic
, tinylabels
, labelled
, and sjlabelled
). While these packages differe with respect to the specific implementation, all implementations are compatible in the sense that they attach the variable labels to the data.frame colums via the label
attribute. I think it would be useful to use such labels, if available, as they can probably be expected to be more informative than the bare column names.
Consider the following example. First, I'll create some labelled data (I think tinylabels
provides the safest implementation, but any of the below will work).
library("dplyr")
library("ggplot2")
# Set up data & labels using {tinylabels}
library("tinylabels")
mtcars2 <- mtcars |>
mutate(gear = factor(gear)) |>
label_variables(
wt = "Weight (1000 lbs)",
gear = "Gears",
vs = "Engine"
)
# # Set up data & labels using {Hmisc}
# library("Hmisc")
# mtcars2 <- mtcars |>
# mutate(gear = factor(gear)) |>
# within({
# label(gear) <- "Gears"
# label(wt) <- "Weight (1000 lbs)"
# label(vs) <- "Engine"
# })
#
# # Set up data & labels using {sjlabelled}
# library("sjlabelled")
# mtcars2 <- mtcars |>
# mutate(gear = factor(gear)) |>
# var_labels(
# wt = "Weight (1000 lbs)",
# gear = "Gears",
# vs = "Engine"
# )
For demonstration purposes I slightly change ggplot_add.labels
to use the label
attribute, if available:
ggplot_add.labels <- function (object, plot, object_name) {
object <- add_variable_labels(object, plot) # Newly added code
ggplot2::update_labels(plot, object)
}
add_variable_labels <- function(labels, plot) {
vars <- sapply(plot$mapping, function(x) rlang::as_name(rlang::f_rhs(x)))
if(length(vars) == 0) {
return(labels)
}
variabel_labels <- lapply(plot$data, attr, "label")[vars]
# Add variable labels
to_add <- !names(vars) %in% names(labels)
for(i in vars[to_add]) {
if(!is.null(variabel_labels[[i]])) {
labels[[names(vars[vars == i])]] <- variabel_labels[[i]]
}
}
labels
}
assignInNamespace("ggplot_add.labels", ggplot_add.labels, "ggplot2")
With this change axis and legend labels will default to the labels
attribute when labs()
is called:
p <- ggplot(mtcars2, aes(x = wt, y = mpg, colour = gear)) +
geom_point()
# Standard behavior (uses column names)
p
# Uses labels, where available
p + labs()
# Overwrite defaults
p + labs(
y = "Fuel economy (mpg)"
, color = "Bears"
)
I think this would be very nice feature and I'd be happy to take a stab at it.
If this is of interest, I see three options for a proper implementation:
- Always default to using labels (after some exploration, the required changes seem managable).
- Only default to using labels when
labs()
is called (pretty much what I have implemented above). - Only default to using labels when
labs()
is called and labels are requested (e.g.,use_labels = TRUE
).
There are probably other sensible options that I'm not seeing. Either way, I'd be very interested to know if this would be a PR you would be willing to consider and in any thoughts you may have on this.