strings in rows of a data frame column

Hi! Hoping to solve this.. but need help

I'm trying to plot something in ggplot that is a bit tricky (to me). I've got a subsetted data frame and I'm characterizing each row in it by one of 10 "classes" (we'll call them identities). Each row can have any combination of the 10 identities. I want to display a value of each row (another column in the data frame) but by identity. So it would be something like a clustered bar graph, and each cluster/identity has a different, often redundant, number of rows within its cluster.. I tried creating a list of strings and putting them in their corresponding rows within a column at the end to then use as my mapping and filling variable. But R is only recognizing the first string in the list of the column.

# New column with each row containing a list of strings describing the row (Gene)
CellType$subtype = character(length = length(CellType$GeneID))


CellType$subtype[which(CellType$GeneID=="Gfra2")]= c("Tyrosine Hydroxylase", "Non-peptidergic 2")
CellType$subtype[which(CellType$GeneID=="Mrgpra3")]= c("Non-peptidergic 2")
CellType$subtype[which(CellType$GeneID=="Mrgprd")]= c("Non-peptidergic 1")
CellType$subtype[which(CellType$GeneID=="Sst")]= c("Non-peptidergic 3")
CellType$subtype[which(CellType$GeneID=="Piezo2")]= c("Tyrosine Hydroxylase")
CellType$subtype[which(CellType$GeneID=="Ldhb")]= c("Neurofilament 1", "Neurofilament 2", "Neurofilament 3", "Neurofilament 4", "Neurofilament 5")
CellType$subtype[which(CellType$GeneID=="Cacna1h")]= c("Neurofilament 1", "Neurofilament 2")
CellType$subtype[which(CellType$GeneID=="Necab2")]= c("Neurofilament 2")
CellType$subtype[which(CellType$GeneID=="Fam19a1")]= c("Neurofilament 3", "Peptidergic 2")

Sub_pop_cluster = ggplot(CellType, aes(x = CellType$subtype, y = CellType$log2.FC.)) + geom_bar(aes(fill = CellType$GeneID), position = "dodge", stat = "identity")

So when R plots this, it is recognizing, from the first example, Gene Gfra2 as "Tyrosine hydroxylase", instead of both "Tyrosine hydroxylase" and "Non-peptidergic 2".

Is there a way to fix this?

Thanks!

Hi Erik, welcome!
It would be easier to help if you provide some sample data, so could you please turn this into a self-contained REPRoducible EXample (reprex)? A reprex makes it much easier for others to understand your issue and figure out how to help.

If you've never heard of a reprex before, you might want to start by reading this FAQ:

Hi! If I'm understanding you correctly, reshaping your data as follows might be what you're looking for.

The second left_join() isn't strictly necessary, but without NAs the columns in your plot might have different widths.

library(tidyverse)
set.seed(123)

gene_to_subtype <- list(Gfra2= c("Tyrosine Hydroxylase", "Non-peptidergic 2"),
                        Mrgpra3 = c("Non-peptidergic 2"),
                        Mrgprd = c("Non-peptidergic 1"),
                        Sst = c("Non-peptidergic 3"),
                        Piezo2 = c("Tyrosine Hydroxylase"),
                        Cacna1h = c("Neurofilament 1", "Neurofilament 2"),
                        Necab2 = c("Neurofilament 2"),
                        Fam19a1 = c("Neurofilament 3", "Peptidergic 2"),
                        Ldhb = c("Neurofilament 1", "Neurofilament 2", "Neurofilament 3", "Neurofilament 4", "Neurofilament 5")) %>% 
  enframe(name = 'gene_id', value = 'subtype') %>% 
  unnest()

dummy_data <- distinct(gene_to_subtype, gene_id) %>% 
  mutate(log2.FC. = rnorm(n = length(gene_id), 1))

(df <- left_join(dummy_data, gene_to_subtype) %>% 
  left_join(expand(gene_to_subtype, gene_id, subtype), .))  # explicit NAs 
#> Joining, by = "gene_id"
#> Joining, by = c("gene_id", "subtype")
#> # A tibble: 90 x 3
#>    gene_id subtype              log2.FC.
#>    <chr>   <chr>                   <dbl>
#>  1 Cacna1h Neurofilament 1          2.72
#>  2 Cacna1h Neurofilament 2          2.72
#>  3 Cacna1h Neurofilament 3         NA   
#>  4 Cacna1h Neurofilament 4         NA   
#>  5 Cacna1h Neurofilament 5         NA   
#>  6 Cacna1h Non-peptidergic 1       NA   
#>  7 Cacna1h Non-peptidergic 2       NA   
#>  8 Cacna1h Non-peptidergic 3       NA   
#>  9 Cacna1h Peptidergic 2           NA   
#> 10 Cacna1h Tyrosine Hydroxylase    NA   
#> # … with 80 more rows

# ggplot(df, aes(x = gene_id, y = log2.FC.)) +
#   geom_col(aes(fill = subtype), position = 'dodge')

Created on 2019-03-13 by the reprex package (v0.2.1)

1 Like

Thanks so much. I think I see. When I tried to reproduce the code, the "enframe" function wasn't being recognized despite having tidyverse or tibble package installed.

Could you please post a reprex of your code? I'm able to run @nathania's code successfully.

library(tidyverse)
set.seed(123)

gene_to_subtype <- list(Gfra2= c("Tyrosine Hydroxylase", "Non-peptidergic 2"),
                        Mrgpra3 = c("Non-peptidergic 2"),
                        Mrgprd = c("Non-peptidergic 1"),
                        Sst = c("Non-peptidergic 3"),
                        Piezo2 = c("Tyrosine Hydroxylase"),
                        Cacna1h = c("Neurofilament 1", "Neurofilament 2"),
                        Necab2 = c("Neurofilament 2"),
                        Fam19a1 = c("Neurofilament 3", "Peptidergic 2"),
                        Ldhb = c("Neurofilament 1", "Neurofilament 2", "Neurofilament 3", "Neurofilament 4", "Neurofilament 5")) %>% 
  enframe(name = 'gene_id', value = 'subtype') %>% 
  unnest()

dummy_data <- distinct(gene_to_subtype, gene_id) %>% 
  mutate(log2.FC. = rnorm(n = length(gene_id), 1))

(df <- left_join(dummy_data, gene_to_subtype) %>% 
    left_join(expand(gene_to_subtype, gene_id, subtype), .))
#> Joining, by = "gene_id"
#> Joining, by = c("gene_id", "subtype")
#> # A tibble: 90 x 3
#>    gene_id subtype              log2.FC.
#>    <chr>   <chr>                   <dbl>
#>  1 Cacna1h Neurofilament 1          2.72
#>  2 Cacna1h Neurofilament 2          2.72
#>  3 Cacna1h Neurofilament 3         NA   
#>  4 Cacna1h Neurofilament 4         NA   
#>  5 Cacna1h Neurofilament 5         NA   
#>  6 Cacna1h Non-peptidergic 1       NA   
#>  7 Cacna1h Non-peptidergic 2       NA   
#>  8 Cacna1h Non-peptidergic 3       NA   
#>  9 Cacna1h Peptidergic 2           NA   
#> 10 Cacna1h Tyrosine Hydroxylase    NA   
#> # … with 80 more rows

Created on 2019-03-15 by the reprex package (v0.2.1.9000)

Note also that you must loat/attach the library in each session, so, if you haven't run library(tidyverse) (as in the reprex), the function enframe() will not be found.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.