R Correlation Matrix

I have a data table like this:

dt <- data.table(Sector = c("Agriculture", "Manufacture", "Music", "Agriculture", "Manufacture", "Music", "Agriculture", "Manufacture", "Music"), Region = c("X", "X", "X", "Y", "Y", "Y", "Z", "Z", "Z"), Year = c("2010", "2011", "2012", "2010", "2011", "2012", "2010", "2011", "2012"), Number = c("238", "75", "1038", "150", "987", "156", "768", "398", "65"), Population = c("200875", "200875", "200875", "375600", "375600", "375600", "492000", "492000", "492000"))

I want to show the relationship over the years with the correlation matrix for the regions. How can I generate correlation matrix and then plot it with ggplot2? Thank you so much.

Hi @ebru,
Welcome to the RStudio Community Forum.

It's not very clear what you want to achieve here plus the data you have provided are weirdly uniform; however, this example code might get you started:

library(data.table)

dt <- data.table(Sector = c("Agriculture", "Manufacture", "Music", "Agriculture", "Manufacture", "Music", "Agriculture", "Manufacture", "Music"),
                 Region = c("X", "X", "X", "Y", "Y", "Y", "Z", "Z", "Z"), 
                 Year = c("2010", "2011", "2012", "2010", "2011", "2012", "2010", "2011", "2012"), 
                 Number = c("238", "75", "1038", "150", "987", "156", "768", "398", "65"), 
                 Population = c("200875", "200875", "200875", "375600", "375600", "375600", "492000", "492000", "492000"))
dt

dt$Year <- as.numeric(dt$Year)
dt$Number <- as.numeric(dt$Number)
dt$Population <- as.numeric(dt$Population)

library(dplyr)
dt %>% 
  group_by(Year, Region) %>% 
  summarise(cor1 = cor(.$Number, .$Population)) -> grp.df
grp.df

library(ggplot2)
ggplot(data=dt, aes(x=Year, y=Population)) + geom_point()
ggplot(data=grp.df, aes(x=Year, y=cor1)) + geom_point() + facet_grid(~ Region)

HTH

@DavoWW , Thank you very much for the code. I'm new in R programming so I'm sorry my faults.

library(data.table)
dt <- data.table(Sector = c("Agriculture", "Manufacture", "Music", "Agriculture", "Manufacture", "Music",
                            "Agriculture", "Manufacture", "Music"),
                 Region = c("X", "X", "X", "Y", "Y", "Y", "Z", "Z", "Z"), 
                 Year = c("2010", "2011", "2012", "2010", "2011", "2012", "2010", "2011", "2012"), 
                 Number = c("238", "75", "1038", "150", "987", "156", "768", "398", "65"), 
                 Population = c("200875", "200875", "200875", "375600", "375600", "375600", "492000",
                                "492000", "492000"))

dt[,`:=`(Year=as.numeric(Year),Number = as.numeric(Number),Population=as.numeric(Population))]
str(dt)
dt
dt[,cor(Number,Population),]
[1] -0.04352187
dt[,cor.test(Number,Population),]
Pearson's product-moment correlation
data:  Number and Population
t = -0.11526, df = 7, p-value = 0.9115
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.6877646  0.6390714
sample estimates:
        cor 
-0.04352187

Correlation analysis plotting

dt[,plot(Number, Population),]
dt[,lattice::levelplot(cor(.SD[,4:5])),]
dt[,PerformanceAnalytics::chart.Correlation(.SD[,c(4,5)], histogram=TRUE,  pch="+"),]
dt[,corrplot::corrplot(cor(.SD[,4:5]),is.corr = FALSE, win.asp = .7, method = "circle"),]
dt[,corrplot::corrplot.mixed(cor(.SD[,4:5]),lower.col = "black", number.cex = .7),]
dt[,corrplot::cor.mtest(.SD[,4:5])$p,]

Thank you so much @Hermes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.