strings in rows of a data frame column

ggplot2
#1

Hi! Hoping to solve this.. but need help

I'm trying to plot something in ggplot that is a bit tricky (to me). I've got a subsetted data frame and I'm characterizing each row in it by one of 10 "classes" (we'll call them identities). Each row can have any combination of the 10 identities. I want to display a value of each row (another column in the data frame) but by identity. So it would be something like a clustered bar graph, and each cluster/identity has a different, often redundant, number of rows within its cluster.. I tried creating a list of strings and putting them in their corresponding rows within a column at the end to then use as my mapping and filling variable. But R is only recognizing the first string in the list of the column.

# New column with each row containing a list of strings describing the row (Gene)
CellType$subtype = character(length = length(CellType$GeneID))


CellType$subtype[which(CellType$GeneID=="Gfra2")]= c("Tyrosine Hydroxylase", "Non-peptidergic 2")
CellType$subtype[which(CellType$GeneID=="Mrgpra3")]= c("Non-peptidergic 2")
CellType$subtype[which(CellType$GeneID=="Mrgprd")]= c("Non-peptidergic 1")
CellType$subtype[which(CellType$GeneID=="Sst")]= c("Non-peptidergic 3")
CellType$subtype[which(CellType$GeneID=="Piezo2")]= c("Tyrosine Hydroxylase")
CellType$subtype[which(CellType$GeneID=="Ldhb")]= c("Neurofilament 1", "Neurofilament 2", "Neurofilament 3", "Neurofilament 4", "Neurofilament 5")
CellType$subtype[which(CellType$GeneID=="Cacna1h")]= c("Neurofilament 1", "Neurofilament 2")
CellType$subtype[which(CellType$GeneID=="Necab2")]= c("Neurofilament 2")
CellType$subtype[which(CellType$GeneID=="Fam19a1")]= c("Neurofilament 3", "Peptidergic 2")

Sub_pop_cluster = ggplot(CellType, aes(x = CellType$subtype, y = CellType$log2.FC.)) + geom_bar(aes(fill = CellType$GeneID), position = "dodge", stat = "identity")

So when R plots this, it is recognizing, from the first example, Gene Gfra2 as "Tyrosine hydroxylase", instead of both "Tyrosine hydroxylase" and "Non-peptidergic 2".

Is there a way to fix this?

Thanks!

0 Likes

#2

Hi Erik, welcome!
It would be easier to help if you provide some sample data, so could you please turn this into a self-contained REPRoducible EXample (reprex)? A reprex makes it much easier for others to understand your issue and figure out how to help.

If you've never heard of a reprex before, you might want to start by reading this FAQ:

0 Likes

#3

Hi! If I'm understanding you correctly, reshaping your data as follows might be what you're looking for.

The second left_join() isn't strictly necessary, but without NAs the columns in your plot might have different widths.

library(tidyverse)
set.seed(123)

gene_to_subtype <- list(Gfra2= c("Tyrosine Hydroxylase", "Non-peptidergic 2"),
                        Mrgpra3 = c("Non-peptidergic 2"),
                        Mrgprd = c("Non-peptidergic 1"),
                        Sst = c("Non-peptidergic 3"),
                        Piezo2 = c("Tyrosine Hydroxylase"),
                        Cacna1h = c("Neurofilament 1", "Neurofilament 2"),
                        Necab2 = c("Neurofilament 2"),
                        Fam19a1 = c("Neurofilament 3", "Peptidergic 2"),
                        Ldhb = c("Neurofilament 1", "Neurofilament 2", "Neurofilament 3", "Neurofilament 4", "Neurofilament 5")) %>% 
  enframe(name = 'gene_id', value = 'subtype') %>% 
  unnest()

dummy_data <- distinct(gene_to_subtype, gene_id) %>% 
  mutate(log2.FC. = rnorm(n = length(gene_id), 1))

(df <- left_join(dummy_data, gene_to_subtype) %>% 
  left_join(expand(gene_to_subtype, gene_id, subtype), .))  # explicit NAs 
#> Joining, by = "gene_id"
#> Joining, by = c("gene_id", "subtype")
#> # A tibble: 90 x 3
#>    gene_id subtype              log2.FC.
#>    <chr>   <chr>                   <dbl>
#>  1 Cacna1h Neurofilament 1          2.72
#>  2 Cacna1h Neurofilament 2          2.72
#>  3 Cacna1h Neurofilament 3         NA   
#>  4 Cacna1h Neurofilament 4         NA   
#>  5 Cacna1h Neurofilament 5         NA   
#>  6 Cacna1h Non-peptidergic 1       NA   
#>  7 Cacna1h Non-peptidergic 2       NA   
#>  8 Cacna1h Non-peptidergic 3       NA   
#>  9 Cacna1h Peptidergic 2           NA   
#> 10 Cacna1h Tyrosine Hydroxylase    NA   
#> # … with 80 more rows

# ggplot(df, aes(x = gene_id, y = log2.FC.)) +
#   geom_col(aes(fill = subtype), position = 'dodge')

Created on 2019-03-13 by the reprex package (v0.2.1)

1 Like

#4

Thanks so much. I think I see. When I tried to reproduce the code, the "enframe" function wasn't being recognized despite having tidyverse or tibble package installed.

0 Likes

#5

Could you please post a reprex of your code? I'm able to run @nathania's code successfully.

library(tidyverse)
set.seed(123)

gene_to_subtype <- list(Gfra2= c("Tyrosine Hydroxylase", "Non-peptidergic 2"),
                        Mrgpra3 = c("Non-peptidergic 2"),
                        Mrgprd = c("Non-peptidergic 1"),
                        Sst = c("Non-peptidergic 3"),
                        Piezo2 = c("Tyrosine Hydroxylase"),
                        Cacna1h = c("Neurofilament 1", "Neurofilament 2"),
                        Necab2 = c("Neurofilament 2"),
                        Fam19a1 = c("Neurofilament 3", "Peptidergic 2"),
                        Ldhb = c("Neurofilament 1", "Neurofilament 2", "Neurofilament 3", "Neurofilament 4", "Neurofilament 5")) %>% 
  enframe(name = 'gene_id', value = 'subtype') %>% 
  unnest()

dummy_data <- distinct(gene_to_subtype, gene_id) %>% 
  mutate(log2.FC. = rnorm(n = length(gene_id), 1))

(df <- left_join(dummy_data, gene_to_subtype) %>% 
    left_join(expand(gene_to_subtype, gene_id, subtype), .))
#> Joining, by = "gene_id"
#> Joining, by = c("gene_id", "subtype")
#> # A tibble: 90 x 3
#>    gene_id subtype              log2.FC.
#>    <chr>   <chr>                   <dbl>
#>  1 Cacna1h Neurofilament 1          2.72
#>  2 Cacna1h Neurofilament 2          2.72
#>  3 Cacna1h Neurofilament 3         NA   
#>  4 Cacna1h Neurofilament 4         NA   
#>  5 Cacna1h Neurofilament 5         NA   
#>  6 Cacna1h Non-peptidergic 1       NA   
#>  7 Cacna1h Non-peptidergic 2       NA   
#>  8 Cacna1h Non-peptidergic 3       NA   
#>  9 Cacna1h Peptidergic 2           NA   
#> 10 Cacna1h Tyrosine Hydroxylase    NA   
#> # … with 80 more rows

Created on 2019-03-15 by the reprex package (v0.2.1.9000)

Note also that you must loat/attach the library in each session, so, if you haven't run library(tidyverse) (as in the reprex), the function enframe() will not be found.

0 Likes

closed #6

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

0 Likes