How can I adjust my heatmap so that it only shows the upper part?

Hello everyone, I was making a heatmap but I have to adjust my heatmap so that it only shows the upper 'triangle' .. I tried a lot of codes which lead me to a weird heatmap (see figure below). Anyone that can help me with this?

image

This is the code I used:
get_upper_tri <- function(correlation.coef){
total[lower.tri(correlation.coef)]<- NA
return(correlation.coef)
}

upper_tri <- get_upper_tri(correlation.coef)
upper_tri

library(reshape2)
melted_total <- melt(upper_tri, na.rm = TRUE)
library(ggplot2)
ggplot(data = upper_tri, aes(x= row, y=col, fill = corr))+
geom_tile(color = "white")+
scale_fill_gradient2(low = "blue", high = "red", mid = "white",
midpoint = 0, limit = c(-1,1), space = "Lab",
name="Pearson\nCorrelation") +
theme_minimal()+
theme(axis.text.x = element_text(angle = 45, vjust = 1,
size = 12, hjust = 1))+ labs( y="Variables", x="CpGs") +
coord_fixed()

Thank you in advance!

Hi Umare, welcome!

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue that includes sample data? Please have a look at this guide, to see how to create one:

# make an heatmap that only displays upper triangular of the correlation matrix
get_upper_tri <- function(data2){
  total[lower.tri(data2)]<- NA
  return(data2)
}

upper_tri <- get_upper_tri(data2)
#> Error in total[lower.tri(data2)] <- NA: object 'total' not found
upper_tri
#> Error in eval(expr, envir, enclos): object 'upper_tri' not found

library(reshape2)
melted_total <- melt(upper_tri, na.rm = TRUE)
#> Error in melt(upper_tri, na.rm = TRUE): object 'upper_tri' not found
library(ggplot2)
#> Registered S3 methods overwritten by 'ggplot2':
#>   method         from 
#>   [.quosures     rlang
#>   c.quosures     rlang
#>   print.quosures rlang
ggplot(data = upper_tri, aes(x= row, y=col, fill = corr))+
  geom_tile(color = "white")+
  scale_fill_gradient2(low = "blue", high = "red", mid = "white", 
                       midpoint = 0, limit = c(-1,1), space = "Lab", 
                       name="Pearson\nCorrelation") +
  theme_minimal()+ 
  theme(axis.text.x = element_text(angle = 45, vjust = 1, 
                                   size = 12, hjust = 1))+ labs( y="Variables", x="CpGs") +
  coord_fixed()
#> Error in ggplot(data = upper_tri, aes(x = row, y = col, fill = corr)): object 'upper_tri' not found

Created on 2019-06-06 by the reprex package (v0.3.0)

Hi @Umare97,

Welcome.

I can't see that this is doing anything.

get_upper_tri <- function(correlation.coef){
total[lower.tri(correlation.coef)]<- NA
return(correlation.coef)
}

You provide correlation.coef as an argument to a function, set values in a data structure called total to be NA (based on their correspondence to positions in a lower triangle of correlation.coef, then return correlation.coef (which hasn't been changed).

As @andresrcs said, a reprex would be helpful.

Ron.

Hi again,

Your function fails in your reprex as total does not exist. Did you mean something like:

get_upper_tri <- function(data2){
  total <- data2
  total[lower.tri(total)]<- NA
  return(total)
}

Maybe?

Your reprex should include the data being used (data2, and total if that is supposed to exist, or some usable, logical subset of it/them). Or some made up data that represents the situation, which shouldn't be too hard if one is just a correlation matrix.

Ron.

2 Likes

I'm sorry for the confusion, I'm just a beginner so I took a little bit of time to understand the whole reprex thing. However, I can't seem to include the data because it says that there is an error and I don't understand what is wrong.

data.frame(row= c(cg02912187,cg22247277,cg01087294,cg23894563,cg10370375, cg10589385,cg22518670, cg27184903, cg21498002, cg00007466 ), col= c(cg15875502,cg15875502,cg15875502,cg15875502,cg15875502,cg15875502,cg02912187,cg02912187,cg02912187,cg02912187),corr=c( -0.0281,0.153,0.137,0.140,0.0960,-0.0513,0.260,-0.0743,0.193,-0.175))
#> Error in data.frame(row = c(cg02912187, cg22247277, cg01087294, cg23894563, : object 'cg02912187' not found
                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                              
# make an heatmap that only displays upper triangular of the correlation matrix
get_upper_tri <- function(data2){
  total[lower.tri(data2)]<- NA
  return(data2)
}

upper_tri <- get_upper_tri(data2)
#> Error in total[lower.tri(data2)] <- NA: object 'total' not found
upper_tri
#> Error in eval(expr, envir, enclos): object 'upper_tri' not found

library(reshape2)
melted_total <- melt(upper_tri, na.rm = TRUE)
#> Error in melt(upper_tri, na.rm = TRUE): object 'upper_tri' not found
library(ggplot2)
#> Registered S3 methods overwritten by 'ggplot2':
#>   method         from 
#>   [.quosures     rlang
#>   c.quosures     rlang
#>   print.quosures rlang
ggplot(data = upper_tri, aes(x= row, y=col, fill = corr))+
  geom_tile(color = "white")+
  scale_fill_gradient2(low = "blue", high = "red", mid = "white", 
                       midpoint = 0, limit = c(-1,1), space = "Lab", 
                       name="Pearson\nCorrelation") +
  theme_minimal()+ 
  theme(axis.text.x = element_text(angle = 45, vjust = 1, 
                                   size = 12, hjust = 1))+ labs( y="Variables", x="CpGs") +
  coord_fixed()
#> Error in ggplot(data = upper_tri, aes(x = row, y = col, fill = corr)): object 'upper_tri' not found

Created on 2019-06-06 by the reprex package (v0.3.0)

The issue here is that the first two columns should be strings, but (as they're unquoted) they're just missing objects as far as R is concerned for the reprex.*

df <- data.frame(row= c("cg02912187", "cg22247277", "cg01087294", "cg23894563", "cg10370375", "cg10589385", "cg22518670", "cg27184903", "cg21498002", "cg00007466"), 
                 col= c("cg15875502", "cg15875502", "cg15875502", "cg15875502", "cg15875502", "cg15875502", "cg02912187", "cg02912187", "cg02912187", "cg02912187"),
                 corr=c( -0.0281,0.153,0.137,0.140,0.0960,-0.0513,0.260,-0.0743,0.193,-0.175))

head(df)
#>          row        col    corr
#> 1 cg02912187 cg15875502 -0.0281
#> 2 cg22247277 cg15875502  0.1530
#> 3 cg01087294 cg15875502  0.1370
#> 4 cg23894563 cg15875502  0.1400
#> 5 cg10370375 cg15875502  0.0960
#> 6 cg10589385 cg15875502 -0.0513

Created on 2019-06-06 by the reprex package (v0.3.0)

The next error I run into is that you're referencing something in your function that doesn't exist/you haven't created, total.

get_upper_tri <- function(data2){
  total[lower.tri(data2)]<- NA
  return(data2)
}

upper_tri <- get_upper_tri(data2)
#> Error in total[lower.tri(data2)] <- NA: object 'total' not found

Created on 2019-06-06 by the reprex package (v0.3.0)

As @ron suggested, it's possible you meant to create total from the initial dataframe (or data2, as you might be calling it).

* If you're trying to use an object that exists in your session, you can use dput(<object goes here>) to turn it into text (no < > are necessary, I'm just using that to denote that you fill that bit in).

Hi @Umare97,

@mara has covered what is wrong with your data.frame call.

However, what is that data.frame? is that ready to go into ggplot? Or is that what you are going to run lower.tri on?

When you said you had correlations and wanted the upper triangle only, I assumed you had a correlation matrix (ie a symmetric matrix with ones on the diagonal and values between -1 and +1 off the diagonal). Is this not the case?

If your data are in a data.frame with character string row and column identifiers, the upper triangle will be defined differently depending on what order you put the strings in when they are converted to a factor (which ggplot will do on the fly, but you might not want them in alphanumeric order).

Ron.

1 Like

Yes, we used the data.frame in order to create a heatmap.. I don't know why but the teacher already gave us the codes where they made a 'total' dataset (so in this case the total is just data2 (subset of the Original 'total' dataset). The code for the normal heatmap was also given so we just had to run it (without any problems). But for the upper region we had to find the codes ourself… @ron out 'total' dataset consists of this:
image and they gave us a code like this to create it: data.frame(row=rownames(correlation.coef)[row(correlation.coef)], col=colnames(correlation.coef)[col(correlation.coef)], corr=c(correlation.coef))->total

the correlation.coef dataset is a symmetric matrix that looks like this:


should I use this then for the upper triangular heatmap?

(I'm really sorry for the inconvenience.. but we directly started with quotes without a steady knowledge of R so that's why I don't really know what I'm doing wrong..)

Hi Umare,

Teacher? Should I be doing your homework?

Never mind, I've started now so here's a little reprex to help you:

library(ggplot2)
#
# Let's start with a little example data, approximating the top left 3x3 of your matrix
correlation.coef <- matrix(c(1, -0.17, 0.04,
                             -0.17, 1, -0.05,
                             0.04,-0.05, 1),
                           dimnames = list(c('cg225', 'cg271', 'cg215'),
                                           c('cg225', 'cg271', 'cg215')),
                           nrow = 3)
#
# locations of upper triangle
ut <- upper.tri(correlation.coef, diag = FALSE)
#
# make a data.frame with the upper triangle only
utdf <- data.frame(row = rownames(correlation.coef)[row(correlation.coef)[ut]],
                   col = colnames(correlation.coef)[col(correlation.coef)[ut]],
                   corr = correlation.coef[ut],
                   stringsAsFactors = FALSE)
#
# If we make a heatmap using this then we will see that ggplot has
# will have plotted the rows in order (cg225, cg271) and the columns
# in order c(cg215, cg271) - whereas they were in order (cg271, cg215)
# in correlation.coef, so the output doesn't look like the upper triangle
# of our matrix. This is because, by default, factors are created in
# alphanumeric order.
ggplot(utdf) + geom_tile(aes(x=col,y=row,fill=corr))

#
# so let's create the factors retaining the correlation.coef ordering,
# remembering that we've dropped the diagonals:
rows_in_utdf <- rownames(correlation.coef)[1:(nrow(correlation.coef)-1)]
cols_in_utdf <- colnames(correlation.coef)[2:ncol(correlation.coef)]
utdf$row1 <- factor(utdf$row, levels = rows_in_utdf)
utdf$col1 <- factor(utdf$col, levels = cols_in_utdf)
ggplot(utdf) + geom_tile(aes(x=col1,y=row1,fill=corr))

#
# the output looks better, but because our correlation.coef is displayed
# with row 1 at the top, and the y-axis on our heatmap has row 1 (level 1
# of our factor) at the bottom, we might want to modify that:
rows_in_utdf <- rev(rownames(correlation.coef)[1:(nrow(correlation.coef)-1)])
cols_in_utdf <- colnames(correlation.coef)[2:ncol(correlation.coef)]
utdf$row2 <- factor(utdf$row, levels = rows_in_utdf)
utdf$col2 <- factor(utdf$col, levels = cols_in_utdf)
ggplot(utdf) + geom_tile(aes(x=col2,y=row2,fill=corr))

Created on 2019-06-07 by the reprex package (v0.2.1)

You can make the heatmap prettier yourself by playing with scale_* and the theme etc.

I'd suggest that, as it looks like homework, you should go through this line by line so you understand what each step is doing, eg look at the output of each step (what does ut look like, what does correlation.coef[ut] produce), rather than just copy and paste the desired code chunk.

Ron.

I’m sorry, that was not my intention.. I thought this was a platform where I could ask questions about things tha weren’t clear for me. Thank you for your answer and time.

I'm not sure whether the community has a homework policy, I know some online forums/mailing lists do (eg R-help has quite a strong no-homework policy).

And, of course, it is then tricky to know what you can ask, or how you can ask it in such a way that you get useful information without having had your homework done for you.

I hope I was of some help.

Ron.

1 Like

I have try to find it without success, but there is policy, as I was pointed out in the past about answering them. And I reckon it was quite friendly, basically, help with hints, gave code tips, help with errors etc, but don't do the task, and expect from the OP to show what he has tried. I reckon there is a thread by @EconomiCurtis about homework policy? I cannot find it. Maybe should be pinned?.
cheers

I had a look around after I'd already responded to @Umare97, and the homework policy is part of the FAQs.

2 Likes

Thanks, I wasn't able to find it. I am great at searching, don;t I? :sweat: