How to get total count of connections and make an adjacency table

I have similar data

df <- data.frame(city1=sample(c("Tokyo","New York","Los Angeles", "Mumbai", "Los Angeles", "Tokyo", "Shanghai", "Kolkata", "Los Angeles", "Tokyo")),
                     city2=sample(c("Tokyo","Tokyo","Tokyo","Tokyo","New York", "Tokyo",'Mumbai', "Los Angeles", "Kolkata", "Shanghai")),
                     city3=sample(c("Los Angeles", "Mumbai", "Shanghai", "New York", "Kolkata", "Los Angeles","Los Angeles","Shanghai","Los Angeles","Los Angeles")),
                     city4=sample(c("Los Angeles", "Kolkata", "Shanghai", "Kolkata", "Shanghai", "Los Angeles", "Tokyo", "Los Angeles", "Shanghai", "Tokyo")))

This gives me

df                    
         city1       city2       city3       city4
1        Tokyo      Mumbai    Shanghai    Shanghai
2      Kolkata       Tokyo      Mumbai Los Angeles
3       Mumbai     Kolkata     Kolkata       Tokyo
4        Tokyo       Tokyo Los Angeles     Kolkata
5        Tokyo       Tokyo Los Angeles     Kolkata
6     New York    Shanghai    Shanghai Los Angeles
7  Los Angeles       Tokyo    New York Los Angeles
8     Shanghai    New York Los Angeles       Tokyo
9  Los Angeles Los Angeles Los Angeles    Shanghai
10 Los Angeles       Tokyo Los Angeles    Shanghai

I want to create two adjacency matrices with following rules. In first matrix, number of connections between cities --from first column with rest of columns in all rows and return total connections in an adjacency matrix (one way-from first column to rest ones), and (2) In second matrix, number of connections between any two cities (two ways). Example of first matrix is similar to (need to have in all):

           Kolkata Los Angeles Mumbai New York Shanghai Tokyo
Kolkata      0         1         
Los Angeles  0         4
Mumbai       2         0
New York     0         1
Shanghai     0         1
Tokyo        2         2

Is there any way to do it easily. Would appreciate any help

Hi,

Welcome to the RStudio community!

First question: What is the goal of this matrix, because it seems formatted in a bit of an odd way... Is this part of a homework or coursework? If so, please make sure you let us know as we have rules regarding helping out in homework.

That aside, I don't fully understand the first matrix. You say to count the cities which appear in the row of each of the first column. This is correct for Tokyo, but I don't see how you get a value of 2 for Mumbai...

PJ

Hi @pieterjanvc

I want to develop chord diagram and do a network analysis. This is absolutely NOT a homework or coursework.

I want to count the number of connections between cities of first column with rest columns. For example, Tokyo is connected with Kolkata two times (row 4 and 5, column 4). This is what I computed there as an example.

Hi,

What do you think of this implementation:

library(dplyr)
library(tidyr)
library(purrr)

#Generate data (no factors for merging later)
options(stringsAsFactors = F)
df <- data.frame(city1=sample(c("Tokyo","New York","Los Angeles", "Mumbai", "Los Angeles", "Tokyo", "Shanghai", "Kolkata", "Los Angeles", "Tokyo")),
                 city2=sample(c("Tokyo","Tokyo","Tokyo","Tokyo","New York", "Tokyo",'Mumbai', "Los Angeles", "Kolkata", "Shanghai")),
                 city3=sample(c("Los Angeles", "Mumbai", "Shanghai", "New York", "Kolkata", "Los Angeles","Los Angeles","Shanghai","Los Angeles","Los Angeles")),
                 city4=sample(c("Los Angeles", "Kolkata", "Shanghai", "Kolkata", "Shanghai", "Los Angeles", "Tokyo", "Los Angeles", "Shanghai", "Tokyo")))


#Get list of all cities in first column
cities = sort(unique(df$city1))
firstCol = data.frame(cities = cities)

#For each city, do the counts
myMatrix = map_dfc(cities, function(city){
  
  #Filter out all rows of a certain city (city1), 
   #then count all connected cities
  newCol = data.frame(cities = unlist(df %>% filter(city1 == city) %>% 
                                      select(-city1))) %>% 
    group_by(cities) %>% summarise(n = n())
  
  #Join the data with the column of all cities, and rename it to the city
  newCol = firstCol %>% left_join(newCol, by = "cities") %>% select(n)
  colnames(newCol) = city
  newCol
})

#Replace NA by 0 (some cities to not coneect to all others)
myMatrix = myMatrix %>% replace(is.na(.), 0)


#Display the final result as ..                      
  #matrix ...
finalMatrix = as.matrix(myMatrix)
rownames(finalMatrix) = cities

  #OR as a data frame with first column cities
finalDf = cbind(cities, myMatrix)

I feel it could be simplified, but this is what I could come up with at the moment :slight_smile:

Hope this helps,
PJ

Thanks @pieterjanvc for your help

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.