Plot group-switching of individual cases over time ggplot2

I am struggling to plot group affiliation at multiple time points with lines tracing group-switching of individual cases. I have looked into sankey plots unsuccessfully. Can you please help?

Can you clarify what you want help with ...
a) deciding what an appropriate plot would be for your data and use case
b) implementing the decided upon plot

?

I'd appreciate help with each of those. I am a novice.

Our ability to help is diminished when we have a lack of context re the data you are dealing with and the purpose that you intend for the plot.
i.e. how many individuals are you tracking switching between however many groups ? do they flow back and forth ? how many time periods ?
is it intended that a viewer should be able to track the movement of each individual, or is it that a general impression of aggregate flows needs to be communicated ?

the more you can tell and describe about what you are doing the better.

The way you posed your issue so far suggests that you want to track individuals, necessarily for human perception to work you will need sufficient space between individuals laid out visually, so are there limits to how high/wide the output image should be ?

Data relates to switching religious affiliations over different timepoints. Dataset has 3,500-5,000 cases depending on the DV, but only several hundred are switching groups. There are 15 groups (including "other" and "multicoded"). I'll work with 2 timepoints per starters.
When you ask about tracking the movement of each individual vs. a general impression- I think general impression would be fine.
I will want to explore predictors of group-switching, eg. life events.

Here is an example to start with

# Load package
library(tidyverse)
library(networkD3)

(my_levels <- expand.grid(a =c("Christian","Muslim"),
                         b = c("(t1)","(t2)")) %>%
  mutate(l = paste(a,b)) %>% pull(l))

(my_readable_data <- tibble::tribble(
  ~source,          ~target, ~value,
  "Christian (t1)", "Christian (t2)",  60,
  "Muslim (t1)"   , "Muslim (t2)"   ,  30,
  "Christian (t1)", "Muslim (t2)"   ,  15,
  "Muslim (t1)"   , "Christian (t2)",   5
))


(rel <- list(
  links_readable = my_readable_data,
  links = my_readable_data %>% mutate_if(is.character,
                                         ~as.integer(factor(.x,levels=my_levels))-1),
  nodes =data.frame(name = my_levels)
))

sankeyNetwork(Links = rel$links, Nodes = rel$nodes, Source = "source",
                   Target = "target", Value = "value", NodeID = "name",
                   units = "People", fontSize = 40, nodeWidth = 50)

Adapted from
Sankey Diagram for energy consumption – the R Graph Gallery (r-graph-gallery.com)

For specifc help relating to your data , you are recommended to provide a reprex.

1 Like

Here it is

#load packages
library(tidyverse)
library(networkD3)
library(haven)
library(readr)

#import data
bcsreprex <- read_csv("https://app.box.com/s/d4ofbn0sfl76ye8nw21xeddzhvddpv4j")
View(bcsreprex)
d <- bcsreprex
str(d)

#as factors
d <- as.data.frame(unclass(d),             
                       stringsAsFactors = TRUE)

# data viz
(my_levels <- expand.grid(a =c("Not stated","Multicode","Incomplete","Unaffiliated",
                               "Christian (no denomination)","Roman Catholic",
                               "Church of England/Anglican","United Reform Church/Congregational",
                               "Baptist","Methodist","Presbyterian/Church of Scotland",
                               "Other Christian","Hindu","Jewish","Muslim","Sikh",
                               "Buddhist","Other"),
                         b = c("Childhood","Age 42")) %>%
  mutate(l = paste(a,b)) %>% pull(l))
(my_readable_data <- tibble::tribble(
  ~source,          ~target, ~value,
## These are artificial values that I want to replace with the actual values
## for the full list above. 
  "Christian (t1)", "Christian (t2)",  60,
  "Muslim (t1)"   , "Muslim (t2)"   ,  30,
  "Christian (t1)", "Muslim (t2)"   ,  15,
  "Muslim (t1)"   , "Christian (t2)",   5
))
(rel <- list(
  links_readable = my_readable_data,
  links = my_readable_data %>% mutate_if(is.character,
                                         ~as.integer(factor(.x,levels=my_levels))-1),
  nodes =data.frame(name = my_levels)
))
sankeyNetwork(Links = rel$links, Nodes = rel$nodes, Source = "Childhood",
                   Target = "Age 42", Value = "Number", NodeID = "Religion",
                   units = "People", fontSize = 40, nodeWidth = 50)