Not sure how to best visualize my data with the string types I have and my Objectives


I am trying to create a personal project where I find a relationship between whether some Medicines caused any side effects. I have a file where there are 1000 subjects (553 males, 447 females) and indicated whether they took any medicines in buildup to any side effects experienced.

I filtered the dataset to get a shortened table of those who just took Medicine1, or Medicine2 or both. An example is this:

#### Filter Medicine1
Med1 <- survey %>%
  filter(Medicine1 == 'Yes')%>%
  filter(Medicine2 == 'No')%>%
  filter(Medicine3 == 'No')

Since I am looking for a relationship between medicines taken and propensity of side effects, the string type are not numbers, hence I cannot do a scatterplot. I decided to opt for a stacked bar graph (If you think this is the best approach or not, please let me know). Is it possible to incorporate 2 filtered (pipe-operated) dataset together or I would have to create bar graphs for each different filtered datasets?

Here is an example of the chart I made:

ggplot(data=Med1, aes(x = SideEffects, color = Gender))+
  labs(title="Side Effects on Ingesting Medicine 1", caption= "Data Collected from Kaggle")

The best representation depends strongly on what you want to show (e.g. a difference between gender? A difference between combinations of medicines? The presence of a particular medicine irrespective of the others?). Very roughly, each statistical test you can do could has its own graph that highlights what you are testing.

Both are possible. You can combine pre-filtered datasets with bind_rows() (after ensuring they have the same columns, including a column identifying what filtering was applied). Or in that case, you could also do the whole processing in a single object, for example:


Med <- tibble(SideEffects = sample(c("Yes","No"), 200, replace = TRUE),
               Gender = sample(c("M","F"), 200, replace = TRUE),
               Medicine1 = sample(c("Yes","No"), 200, replace = TRUE),
               Medicine2 = sample(c("Yes","No"), 200, replace = TRUE))

Med |>
  rowwise() %>%
  mutate(medicines = paste0(c_across(starts_with("Medicine")), collapse = ",")) %>%
  ggplot() +
  geom_bar(aes(x = medicines)) +

Technically, you can do a scatterplot when x is a string, and on y you are not plotting a string but a count. So, it is possible to make a scatterplot. Whether it's better suited than a barplot, that's a different question. For example here is a scatterplot that highlights the presence of Medicine1:

Med %>%
  mutate(had_medicine1 = (Medicine1 == "Yes"),
         had_medicine2 = (Medicine2 == "Yes")) %>%
  group_by(Gender, had_medicine1, had_medicine2) %>%
  summarize(proportion_with_side_effect = 100 * mean(SideEffects == "Yes")) %>%
  ggplot() +
  geom_point(aes(x = had_medicine1, y = proportion_with_side_effect, color = had_medicine2)) +

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.