dbplyr Postgres na percentage count trick

I wanted to use dbplyr to push a compute down to a Postgres database. It will time out if I try to download the table it's a public cloud database, so this is my story.

I wanted to see which date columns were mostly empty. Thus, I wanted to calculate the percentage NA for each column with keywords in the column name.

Note: I am using lazy evaluation with a live database connection.

Googled Stackoverflow Common Solution

 x %>% 


The problem with this is the genereated Postgres sql SELECT SUM(colname is NULL) ... is not a valid operation with the version of Postgres I am using. (v11)

Final Solution

data_tbl %>%
  select(contains("date"), contains("month"), contains("year")) %>%
        is.na(.) ~ 1,
        TRUE ~ 0))) %>%

This generates perfect SQL for my use.

I thought i'd share and save the world some trouble for the next person.


This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.