how to make sure the new legend labels are correct?

Sorry if this is a silly question, but consider this simple example


  tibble(type = c('a','b','c'),
         x = c(1,2,3),
         y = c(10,0,10)) %>% 
    ggplot(aes(x, y , color = type)) +
    geom_point()+
    scale_color_discrete(labels = c('hello','world','!!!'))

As you can see, I am renaming the legend with scale_color_discrete.

However, I am not sure I understand what is the exact mapping here. Are the labels applied to type when type is sorted alphabetically? That is, hello replaces whatever value is first when type is sorted (that is a in this case)? What about weird other situations?

Is there a more robust way to do so? For instance by specifying a list like list(old_label = new_label) so that there is no ambiguity in the labeling?

Does that make sense?
Thanks!

1 Like

It sounds like you got the idea! If you don't supply names to the vector of labels it will fill them in whatever the default order of the legend is (which is alphanumeric if you haven't set factor levels).

You can supply a named vector to be certain that the correct new label is applied to each old label. This is definitely "safest", since it avoids any ordering surprises.

The named vector could look something like:

scale_color_discrete(labels = c(a = 'hello', c = 'world', b = '!!!'))

(Note I changed the order of how the new labels map to the old labels :wink:.)

1 Like

haha wonderful! thanks! I think another approach would be to define the type as factor and change the labels. Is that true? do you know how I can do that efficiently?

Yep, I often do this during my data manipulation step.

Using factor(), you can set the levels (which sets the order of the levels based on the current values) and the labels (replace the current levels with new values; must be in the same order as levels to work correctly).

For example, prior to plotting you could do

tibble(type = c('a','b','c'),
       x = c(1,2,3),
       y = c(10,0,10)) %>%
    mutate(type = factor(type,
                         levels = c("a", "b", "c"),
                         labels = c("hello", "world", "again")))

The forcats package has a lot of handy functions for working with factors. The fct_recode() function is for recoding values to new ones. Note the order is new label = old label (I'm sure I'd get this backwards if I hadn't looked at the documentation just now).

tibble(type = c('a','b','c'),
       x = c(1,2,3),
       y = c(10,0,10)) %>%
    mutate(type = forcats::fct_recode(type,
                         hello = "a",
                         world = "b",
                         again = "c"))
1 Like

super useful. thanks!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.