The most common packages for working with graphs are {igraph}
, {network}
, and {tidygraph}
. The best approach would be to load your data in one of those packages, which then offer a number of algorithms for clustering (also called "community detection" in this context). For exemple, see all the cluster_*()
functions in igraph, and the group_*()
functions in tidygraph.
In your case it looks like you have a directed network, so some algorithms will not work (you can decide to ignore the directionality). Clustering in general is a hard problem: there is no single best algorithm that always work on every dataset; you may have to experiment with existing algorithms.
For example:
library(igraph)
#>
#> Attaching package: 'igraph'
#> The following objects are masked from 'package:stats':
#>
#> decompose, spectrum
#> The following object is masked from 'package:base':
#>
#> union
set.seed(123)
df <- data.frame(start_node = paste0("x", sample(1:7, replace = TRUE)),
end_node = paste0("x", sample(1:7, replace = TRUE))) |>
dplyr::filter(start_node != end_node)
df
#> start_node end_node
#> 1 x7 x6
#> 2 x7 x3
#> 3 x3 x5
#> 4 x6 x4
#> 5 x3 x6
#> 6 x2 x6
#> 7 x2 x1
gr <- igraph::graph_from_data_frame(df,
directed = TRUE)
plot(gr)

cluster_spinglass(gr)
#> IGRAPH clustering spinglass, groups: 2, mod: 0.2
#> + groups:
#> $`1`
#> [1] "x7" "x3" "x5"
#>
#> $`2`
#> [1] "x6" "x2" "x4" "x1"
#>
gr <- igraph::graph_from_data_frame(df,
directed = FALSE)
plot(gr)

cluster_louvain(gr)
#> IGRAPH clustering multi level, groups: 3, mod: 0.21
#> + groups:
#> $`1`
#> [1] "x7" "x3" "x5"
#>
#> $`2`
#> [1] "x6" "x4"
#>
#> $`3`
#> [1] "x2" "x1"
#>
Created on 2022-04-29 by the reprex package (v2.0.1)