ggplot match color code column to another column in R dataframe

I have this simple dataframe in the form:

x    y    tissue    color_code
1    2    nerve     #EEEE00
2    3    brain     #33CCCC

There are several thousands rows for each tissue. Each tissue will have the same color_code (e.g. all rows with nerve have a color code of #EEEE00, etc.)

I would like to make a simple line plot of y vs x and color by tissue using the corresponding color code.

When I attempt to plot this, the two points are red and black, instead of yellow (#EEEE00) and cyan (#33CCCC).

library(ggplot2)

test_df = data.frame("x"=c(1,2),"y"=c(2,3),"tissue"=c("nerve","brain"),"color_code"=c("#EEEE00","#33CCCC"))

ggplot(test_df,aes(x=x,y=y,color=tissue))+
  geom_point()+
  scale_color_manual(values=test_df$color_code)

Hi Andrew, welcome!

I think this is what you are looking for

library(ggplot2)

test_df = data.frame("x"=c(1,2),"y"=c(2,3),"tissue"=c("nerve","brain"),"color_code"=c("#EEEE00","#33CCCC"))

ggplot(test_df,aes(x = x, y = y, color = color_code))+
    geom_point() +
    scale_color_identity(guide = "legend", labels = c("brain", "nerve")) +
    labs(color = "Tissue")

Created on 2019-03-15 by the reprex package (v0.2.1)

2 Likes

This has to do with color_code being a factor instead of a character.

See the difference with scale_color_manual(values= as.character(test_df$color_code)).

Andres, I'm not being able to generalise your code to more than 2 categories. See below:

library(ggplot2)

test_df = data.frame(x = c(1, 2, 3),
                     y = c(2, 3, 4),
                     tissue = c("nerve", "brain", "heart"),
                     color_code = c("#FF0000", "#00FF00", "#0000FF"))

ggplot(data = test_df,
       mapping = aes(x = x,
                     y = y,
                     color = color_code)) +
  geom_point() +
  scale_color_identity(guide = "legend",
                       labels = c("brain", "heart", "nerve")) +
  labs(color = "Tissue")

Created on 2019-03-16 by the reprex package (v0.2.1)

I think OP expects nerve in red, brain in green and heart in blue in this case, but it's not happening. Actually, the points are in correct positions, but the legend is wrong. Can you please point out my mistake?

I think that actually the OP is just asking for nerve in yellow and brain in cyan, but your problem is with the order of the labels, I think you can fix this by specifying the brakes argument

library(ggplot2)

test_df = data.frame(x = c(1, 2, 3),
                     y = c(2, 3, 4),
                     tissue = c("nerve", "brain", "heart"),
                     color_code = c("#FF0000", "#00FF00", "#0000FF"))

ggplot(data = test_df,
       mapping = aes(x = x,
                     y = y,
                     color = color_code)) +
    geom_point() +
    scale_color_identity(guide = "legend",
                         labels = c("nerve", "brain", "heart"),
                         breaks = test_df$color_code) +
    labs(color = "Tissue")

Thanks, but I'm sorry that I'll have one more question regarding this. I'm just learning ggplot2, so please don't mind.

I'm assuming that in the complete dataset, there'll be repetitions. What do you suggest to resolve this?

library(ggplot2)

test_df = data.frame(x = c(1, 2, 3, 5),
                     y = c(2, 3, 4, 6),
                     tissue = c("nerve", "brain", "heart", "nerve"),
                     color_code = c("#FF0000", "#00FF00", "#0000FF", "#FF0000"))

ggplot(data = test_df,
       mapping = aes(x = x,
                     y = y,
                     color = color_code)) +
  geom_point() +
  scale_color_identity(guide = "legend",
                       labels = c("nerve", "brain", "heart"),
                       breaks = test_df$color_code) +
  labs(color = "Tissue")
#> Error: `breaks` and `labels` must have the same length

Created on 2019-03-16 by the reprex package (v0.2.1)

Actually, I find @aosmith's solution much easier for this situation, with some modification for the alphabetical ordering.

library(ggplot2)

test_df <- data.frame(x = c(1, 2, 3, 5),
                     y = c(2, 3, 4, 6),
                     tissue = c("nerve", "brain", "heart", "nerve"),
                     color_code = c("#FF0000", "#00FF00", "#0000FF", "FF0000"),
                     stringsAsFactors = FALSE)

ggplot(data = test_df,
      mapping = aes(x = x,
                    y = y,
                    color = tissue)) +
 geom_point() +
 scale_color_manual(values = test_df$color_code[order(test_df$tissue)])

Created on 2019-03-16 by the reprex package (v0.2.1)

1 Like

I don't see the additional complication but one of the cool aspects of R is that there is more than one way for doing everything and you can choose the one that fits your needs the most

library(ggplot2)

test_df = data.frame(x = c(1, 2, 3, 5),
                     y = c(2, 3, 4, 6),
                     tissue = c("nerve", "brain", "heart", "nerve"),
                     color_code = c("#FF0000", "#00FF00", "#0000FF", "#FF0000"))

ggplot(data = test_df,
       mapping = aes(x = x,
                     y = y,
                     color = color_code)) +
    geom_point() +
    scale_color_identity(guide = "legend",
                         labels = test_df$tissue,
                         breaks = test_df$color_code) +
    labs(color = "Tissue")

1 Like

I think getting two columns to match up is always a little complicated! :slight_smile:

I see I wasn't paying attention to the order of the two columns, and you came up with a good solution.

I would probably use forcats::fct_inorder() on the tissue column to achieve a similar result (unless there was a strong reason to use tissue in alphabetical order).

ggplot(data = test_df,
       mapping = aes(x = x,
                     y = y,
                     color = forcats::fct_inorder(tissue))) +
     geom_point() +
     scale_color_manual(name = "Tissue", values = test_df$color_code)

2 Likes

I've never noticed ggplot2 to use anything but alphabetical ordering by default, but then I haven't using it for long. So, I guess forcats::fct_inorder would really be preferable even in other cases.

When R makes something into factor it defaults to sorting the levels into "increasing order of x" (from documentation of factor()). This is alphabetical order for character vectors. The same thing happens when ggplot2 converts a character vector to a factor. So if we want to use a different order we have to set the order of the levels manually.

Package forcats has a lot of convenient functions for easily setting the order of the levels. I end up using fct_inorder() a fair amount if the variable is already in a nice order in the dataset. This happens when I'm, e.g., trying to plot month names and my data are stored in order by month.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.