Determing valid directed graph selections

Hello,

There are two things I am hoping to get right:

  1. I have the following data below. I want to be able to cut invalid selection columns. Invalid selection columns are those that contain both "a -> b" and "b -> a" as an example, i.e. if the source = target and the target = source for two path combinations then it shouldn't be valid as it can only be directed into a single direction between those nodes. See example data below:
df1 <- data.frame(
  stringsAsFactors = FALSE,
            source = c("a", "b", "c", "d", "e", "c", "a", "e", "a"),
            target = c("b", "c", "d", "e", "a", "b", "d", "d", "e"),
         selection1 = c(1, 0, 1, 0, 0, 1, 1, 0, 0),
         selection2 = c(1, 1, 0, 0, 1, 1, 0, 0, 1),
         selection3 = c(1, 1, 1, 1, 1, 0, 0, 0, 0)
)

  1. Then I also have a second problem which I believe to be more complicated and I am hoping there is a package for this. I want to exclude selections that have contradictory self reference i.e. a->b b->c c->a. This combination is essentially a triangle and our start and end point is the same node. This condition shouldn't be true for combinations. So a -> b and c-> a would be fine and b->c with either path 1 or 3 but not the combination of all three. These toy problems are relatively simple but I would want them to work for complicated dataframes too.
df2 <- data.frame(
  stringsAsFactors = FALSE,
            source = c("a", "b", "c", "a", "b"),
            target = c("b", "c", "a", "d", "d"),
        seection1 = c(1, 1, 1, 1, 0),
        seection2 = c(1, 0, 1, 0, 0),
        seection3 = c(1, 0, 0, 0, 1)
)

Any help would be massively appreciated!

to start of I would do something like :

df1 <- data.frame(
  stringsAsFactors = FALSE,
  source = c("a", "b", "c", "d", "e", "c", "a", "e", "a"),
  target = c("b", "c", "d", "e", "a", "b", "d", "d", "e"),
  selection1 = c(1, 0, 1, 0, 0, 1, 1, 0, 0),
  selection2 = c(1, 1, 0, 0, 1, 1, 0, 0, 1),
  selection3 = c(1, 1, 1, 1, 1, 0, 0, 0, 0)
)

(identify_possibles <- group_by(
  df1,
  selection1,
  selection2,
  selection3
) %>% group_modify(
  ~  if (nrow(.) > 1) {data.frame(.)}  else { data.frame() }
) )

as some selections are unique and so dont have source-target combinations to assess - best strip them out and focus on those that do. the above code produces a new (grouped) table of the selections worth further investigating.

Then I would try to integrate my solution here:

conceptually it seems the same its just that first name is source and last name is target.

good luck

1 Like

Hello,

I will have a look thanks :slight_smile: I have realised I what I am looking for relates to graph theory and is called connected and acyclic. Within the package semPLS they perform these checks as functions on the inserted model but I want to evaluate those combinations before they can even be inserted into the model object (as I intend to run thousands of combinations) and I would like to reduce the set before pushing it to the model. I tried to extract those functions from the package but they are not straightforward to extract and setup :confused:

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.