The most common packages for working with graphs are `{igraph}`

, `{network}`

, and `{tidygraph}`

. The best approach would be to load your data in one of those packages, which then offer a number of algorithms for clustering (also called "community detection" in this context). For exemple, see all the `cluster_*()`

functions in igraph, and the `group_*()`

functions in tidygraph.

In your case it looks like you have a directed network, so some algorithms will not work (you can decide to ignore the directionality). Clustering in general is a hard problem: there is no single best algorithm that always work on every dataset; you may have to experiment with existing algorithms.

For example:

```
library(igraph)
#>
#> Attaching package: 'igraph'
#> The following objects are masked from 'package:stats':
#>
#> decompose, spectrum
#> The following object is masked from 'package:base':
#>
#> union
set.seed(123)
df <- data.frame(start_node = paste0("x", sample(1:7, replace = TRUE)),
end_node = paste0("x", sample(1:7, replace = TRUE))) |>
dplyr::filter(start_node != end_node)
df
#> start_node end_node
#> 1 x7 x6
#> 2 x7 x3
#> 3 x3 x5
#> 4 x6 x4
#> 5 x3 x6
#> 6 x2 x6
#> 7 x2 x1
gr <- igraph::graph_from_data_frame(df,
directed = TRUE)
plot(gr)
```

```
cluster_spinglass(gr)
#> IGRAPH clustering spinglass, groups: 2, mod: 0.2
#> + groups:
#> $`1`
#> [1] "x7" "x3" "x5"
#>
#> $`2`
#> [1] "x6" "x2" "x4" "x1"
#>
gr <- igraph::graph_from_data_frame(df,
directed = FALSE)
plot(gr)
```

```
cluster_louvain(gr)
#> IGRAPH clustering multi level, groups: 3, mod: 0.21
#> + groups:
#> $`1`
#> [1] "x7" "x3" "x5"
#>
#> $`2`
#> [1] "x6" "x4"
#>
#> $`3`
#> [1] "x2" "x1"
#>
```

^{Created on 2022-04-29 by the reprex package (v2.0.1)}