How to ignore missing values in numeric variables without removing rows in ggplot2

I need to create a scatterplot of two continuous variables (prop_segundos, prop_primeiros), but ggplot2 removes all rows because either one or the other has NAs. This is my reprex:

tribble_paste(d)
tibble::tribble(
               ~prop, ~rank,    ~prop_segundos, ~votos_segundo,  ~dif,           ~dif_pct,   ~prop_primeiros,
   0.469359963685883,    2L, 0.469359963685883,          1034L,    NA,                 NA,                NA,
   0.530640036314117,    1L,                NA,             NA,  135L, 0.0612800726282343, 0.530640036314117,
  0.0290898886914227,    4L,                NA,             NA,    NA,                 NA,                NA,
   0.137031147694322,    3L,                NA,             NA,    NA,                 NA,                NA,
   0.214947151809934,    2L, 0.214947151809934,          2298L,  833L, 0.0779160041156113,                NA,
   0.618931811804321,    1L,                NA,             NA, 4319L,  0.403984659994388, 0.618931811804321
  )

And this is the ggplot I'm trying to run:

ggplot(
   data = d, 
  aes(x = prop_primeiros, y = prop_segundos)) +
  geom_point() +
  geom_smooth() +
  stat_smooth(
    method = "lm",
    color = "#C42126",
    na.rm = TRUE,
    se = FALSE) +
  scale_y_continuous(limits=c(0,1)) +
  scale_x_continuous(limits=c(0,1)) +
  xlab("Porcentagem dos votos do primeiro") +
  ylab("Porcentagem dos votos do segundo") +
  ggtitle("Relação % primeiro-segundo nos casos não judicializados")

##Then I get this error message:

`geom_smooth()` using method = 'loess' and formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
Warning messages:
1: Removed 6 rows containing non-finite values (stat_smooth). 
2: Removed 6 rows containing missing values (geom_point). 

I get a plot with the proper labels and title, but no observations. Is there any way of telling ggplot to ignore missing values and plot only avaluable observations? Removing rows with NAs is no good, otherwise all my obs. are removed.

You can't plot (x,y) points if one of the components is NA, this is not ggplot2 related but a simple common-sense matter. I recommend you to rethink what is what you are trying to plot.

You can fill NA based on previous values in any direction but I don't understand the logic of completing your data that way.

library(tidyverse)

d <- data.frame(
  prop_segundos = c(0.469359963685883, NA, NA, NA, 0.214947151809934, NA),
  prop_primeiros = c(NA, 0.530640036314117, NA, NA, NA, 0.618931811804321)
)

d %>% 
    fill(prop_segundos, .direction = "down") %>% 
    fill(prop_primeiros, .direction = "up") %>% 
    ggplot(
    aes(x = prop_primeiros, y = prop_segundos)) +
    geom_point()

Created on 2020-12-05 by the reprex package (v0.3.0.9001)

1 Like

Just for some closure, this is how I went about doing it after all:


pct_pri_seg <- cepesp_sem_tre %>% filter(rank == 1 | rank == 2) %>%
  group_by(municipio) %>%
  select(prop_primeiros, 
         prop_segundos) %>% 
  as.data.frame() 

prop_primeiros <- pct_pri_seg[!is.na(pct_pri_seg$prop_primeiros), ]
prop_segundos <- pct_pri_seg[!is.na(pct_pri_seg$prop_segundos), ]

prop_primeiros <- semi_join(prop_primeiros,
                            prop_segundos, by = 'municipio')

prop_primeiros <- prop_primeiros[ ,-3] #dropping columns with NAs
prop_segundos <- prop_segundos[ ,-2]

props_sem_tre <- left_join(
  prop_primeiros, prop_segundos, by = 'municipio'
)

After this I ended up with a dataframe with only assigned values and both columns with the same sizes. Then I could run my ggplot normally.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

I understand. What I'm asking is if there is any way to tell R to skip NAs in one of the points until the next assigned value. For example: in the first row (x = 0.4, y = NA), we don't have a value assigned to y, but, in the next observation of y, we have an assigned value (this point could be x = NA, y = 0.5). And, in this case, the next assigned value wouldn't necessarily be in the next row. I think there probably is a better way of doing this though, I'll figure something out.

Thank you for the reply.