tidyverse and srvyr - select columns

Hello,
I recently notice that I can use select(ends_with("")) over the code below:


library(survey)
data(api)


dstrata <- apistrat %>%
  as_survey_design(strata = stype, weights = pw, fpc=fpc)

dstrata %>% 
  mutate(s1=if_else(comp.imp=="Yes",fpc,0),
         s2=if_else(stype=="E",fpc,0),
         s3=if_else(dnum>=500,fpc,0)) %>% 
  group_by(cname) %>%
  summarise(
    across(s1:s3, survey_total,vartype=c("cv"))
  ) %>% 
  select(ends_with("cv"))

But if I use a prefix when I compute the CVs, I lost the ability to use select(....


dstrata %>% 
  mutate(s1=if_else(comp.imp=="Yes",fpc,0),
         s2=if_else(stype=="E",fpc,0),
         s3=if_else(dnum>=500,fpc,0)) %>% 
  group_by(cname) %>%
  summarise(
    number_f=across(s1:s3, survey_total,vartype=c("cv"))
  ) %>% 
  select(ends_with("cv"))
# A tibble: 40 x 0

How can I adapt the later code in order to use select, and all the options, in order to rearrange the output estimation?

By the way, If I run glimpse using the last code, I receive:

Rows: 40
Columns: 4
$ cname      <chr> "Alameda", "Amador", "Butte", "Colusa", "Contra Costa", "El Dorado", "Fresno", "Humboldt", "Inyo", "Ker~
$ number_fs1 <df[,2]> <data.frame[40 x 2]>
$ number_fs2 <df[,2]> <data.frame[40 x 2]>
$ number_fs3 <df[,2]> <data.frame[40 x 2]>

I know It's a bit silly, but I get confused.
Thanks for your time and interest.
Have a nice day

Hm, this is an interesting problem, and I believe reveals a design flaw in srvyr. I've opened an issue at https://github.com/gergness/srvyr/issues/129

This is happening because the when you name the argument that uses across, dplyr stores the column as data.frame columns. This problem is extra confusing because srvyr tries to be helpful and unpacks one level of data.frame column for you automatically, so you don't actually get any columns named "number_f", they are named "number_fs1".

One way to fix is to use tidyr::unpack() like so, but I've opened an issue in srvyr so this behavior may change in the future.

library(srvyr)
library(survey)
data(api)
library(dplyr)


dstrata <- apistrat %>%
  as_survey_design(strata = stype, weights = pw)


dstrata %>% 
  mutate(s1=if_else(comp.imp=="Yes",fpc,0),
         s2=if_else(stype=="E",fpc,0),
         s3=if_else(dnum>=500,fpc,0)) %>% 
  group_by(cname) %>%
  summarise(
    number_f=across(s1:s3, survey_total,vartype=c("cv"))
  ) %>% 
  tidyr::unpack(starts_with("number_fs"), names_sep = "") %>%
  select(ends_with("cv"))
#> # A tibble: 40 × 3
#>    number_fs1_cv number_fs2_cv number_fs3_cv
#>            <dbl>         <dbl>         <dbl>
#>  1         1.00          0.492         0.553
#>  2       NaN           NaN           NaN    
#>  3       NaN           NaN           NaN    
#>  4         1.00        NaN             1.00 
#>  5         0.502         0.571         0.825
#>  6         0.736       NaN             1.00 
#>  7         0.391         0.341         1.00 
#>  8       NaN           NaN           NaN    
#>  9         0.704         0.704       NaN    
#> 10         0.480         0.438         0.670
#> # … with 30 more rows

Created on 2021-09-04 by the reprex package (v2.0.1)

1 Like

Thanks, gergness.
Your explanation was very clear.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.