Use stringr functions in rename_with

Howdy!

I am pulling resource quota data from Kubernetes clusters using the K8s APIs. When pulling storage data, the variables have very long long names in most cases like "hard.local.shared.10gi.storageclass.storage.k8s.io.requests.storage" . The important information I want to retain from these names is everything before ".storageclass.storage.k8s.io.requests.storage". In other words, I only wish to keep "hard.local.shared.10gi". As these relate to storage sizes which vary tremendously, I would like an elegant solution to rename them all (not just dplyr::rename("hard.local.shared.10gi" = hard.local.shared.10gi.storageclass.storage.k8s.io.requests.storage).

This is my first attempt using the newer rename_with and I would like to pass str_remove (or conversely a str_extract) to it to accomplish this. I seem to be getting an error that I would not get if I were to use the same str_remove function in say a mutate. TIA!

I created a small sample df for reprex

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(stringr)

df <- structure(list(hard.local.shared.10gi.storageclass.storage.k8s.io.requests.storage = structure(c(1L, 
                                                                                                       2L, 3L, 3L, 4L, 2L, 5L, 5L, 3L, 5L), .Label = c("100Gi", "20Gi", 
                                                                                                                                                       "30Gi", "300Gi", "10Gi", "200Gi", "50Gi", "80Gi", "40Gi", "190Gi", 
                                                                                                                                                       "180Gi", "90Gi", "400Gi", "60Gi", "70Gi", "250Gi", "1000Ti", 
                                                                                                                                                       "380Gi", "46Gi", "120Gi", "1180Gi", "150Gi"), class = "factor"), 
                     used.local.shared.10gi.storageclass.storage.k8s.io.requests.storage = structure(c(1L, 
                                                                                                       2L, 3L, 2L, 1L, 2L, 2L, 4L, 3L, 2L), .Label = c("60Gi", "0", 
                                                                                                                                                       "30Gi", "10Gi", "20Gi", "40Gi", "190Gi", "180Gi", "50Gi", 
                                                                                                                                                       "90Gi", "70Gi", "2Gi", "80Gi", "7Gi", "11Gi", "300Gi", "46Gi", 
                                                                                                                                                       "45Gi", "15Gi", "110Gi", "150Gi"), class = "factor"), hard.local.shared.25gi.storageclass.storage.k8s.io.requests.storage = structure(c(NA, 
                                                                                                                                                                                                                                                                                               1L, NA, NA, NA, 1L, NA, NA, NA, NA), .Label = c("50Gi", "75Gi", 
                                                                                                                                                                                                                                                                                                                                               "100Gi", "25Gi", "225Gi", "1475Gi", "1250Gi", "150Gi", "175Gi", 
                                                                                                                                                                                                                                                                                                                                               "250Gi", "650Gi", "125Gi", "500Gi", "300Gi", "1000Ti", "325Gi", 
                                                                                                                                                                                                                                                                                                                                               "375Gi", "550Gi"), class = "factor"), hard.local.shared.50gi.storageclass.storage.k8s.io.requests.storage = structure(c(1L, 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                       NA, NA, NA, 1L, NA, NA, NA, NA, NA), .Label = c("200Gi", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       "550Gi", "450Gi", "350Gi", "150Gi", "100Gi", "800Gi", "400Gi", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       "50Gi", "1800Gi", "1550Gi", "600Gi", "300Gi", "2500Gi", "1400Gi", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       "250Gi", "1500Gi", "2950Gi", "1000Ti", "950Gi", "500Gi", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       "8500Gi"), class = "factor")), .Names = c("hard.local.shared.10gi.storageclass.storage.k8s.io.requests.storage", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 "used.local.shared.10gi.storageclass.storage.k8s.io.requests.storage", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 "hard.local.shared.25gi.storageclass.storage.k8s.io.requests.storage", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 "hard.local.shared.50gi.storageclass.storage.k8s.io.requests.storage"
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       ), row.names = c(NA, 10L), class = "data.frame")
df.rename <- df %>% 
  mutate(across(where(is.factor), as.character)) %>% 
  rename_with(~str_remove(., ".storageclass\\.storage\\.k8s\\.io\\.requests\\.storage.*"), where(contains("storageclass.storage.k8s.io.requests.storage")))
#> Error: Can't convert an integer vector to function

Created on 2020-11-22 by the reprex package (v0.3.0)

Maybe this?

library(dplyr, warn.conflicts = FALSE)
library(stringr)

df <- structure(list(
  hard.local.shared.10gi.storageclass.storage.k8s.io.requests.storage = structure(c(
    1L,
    2L, 3L, 3L, 4L, 2L, 5L, 5L, 3L, 5L
  ), .Label = c(
    "100Gi", "20Gi",
    "30Gi", "300Gi", "10Gi", "200Gi", "50Gi", "80Gi", "40Gi", "190Gi",
    "180Gi", "90Gi", "400Gi", "60Gi", "70Gi", "250Gi", "1000Ti",
    "380Gi", "46Gi", "120Gi", "1180Gi", "150Gi"
  ), class = "factor"),
  used.local.shared.10gi.storageclass.storage.k8s.io.requests.storage = structure(c(
    1L,
    2L, 3L, 2L, 1L, 2L, 2L, 4L, 3L, 2L
  ), .Label = c(
    "60Gi", "0",
    "30Gi", "10Gi", "20Gi", "40Gi", "190Gi", "180Gi", "50Gi",
    "90Gi", "70Gi", "2Gi", "80Gi", "7Gi", "11Gi", "300Gi", "46Gi",
    "45Gi", "15Gi", "110Gi", "150Gi"
  ), class = "factor"), hard.local.shared.25gi.storageclass.storage.k8s.io.requests.storage = structure(c(
    NA,
    1L, NA, NA, NA, 1L, NA, NA, NA, NA
  ), .Label = c(
    "50Gi", "75Gi",
    "100Gi", "25Gi", "225Gi", "1475Gi", "1250Gi", "150Gi", "175Gi",
    "250Gi", "650Gi", "125Gi", "500Gi", "300Gi", "1000Ti", "325Gi",
    "375Gi", "550Gi"
  ), class = "factor"), hard.local.shared.50gi.storageclass.storage.k8s.io.requests.storage = structure(c(
    1L,
    NA, NA, NA, 1L, NA, NA, NA, NA, NA
  ), .Label = c(
    "200Gi",
    "550Gi", "450Gi", "350Gi", "150Gi", "100Gi", "800Gi", "400Gi",
    "50Gi", "1800Gi", "1550Gi", "600Gi", "300Gi", "2500Gi", "1400Gi",
    "250Gi", "1500Gi", "2950Gi", "1000Ti", "950Gi", "500Gi",
    "8500Gi"
  ), class = "factor")
), .Names = c(
  "hard.local.shared.10gi.storageclass.storage.k8s.io.requests.storage",
  "used.local.shared.10gi.storageclass.storage.k8s.io.requests.storage",
  "hard.local.shared.25gi.storageclass.storage.k8s.io.requests.storage",
  "hard.local.shared.50gi.storageclass.storage.k8s.io.requests.storage"
), row.names = c(NA, 10L), class = "data.frame")

df.rename <- df %>% 
  mutate(across(where(is.factor), as.character)) %>%
  rename_with(~ str_remove(., ".storageclass.*"))

df.rename
#>    hard.local.shared.10gi used.local.shared.10gi hard.local.shared.25gi
#> 1                   100Gi                   60Gi                   <NA>
#> 2                    20Gi                      0                   50Gi
#> 3                    30Gi                   30Gi                   <NA>
#> 4                    30Gi                      0                   <NA>
#> 5                   300Gi                   60Gi                   <NA>
#> 6                    20Gi                      0                   50Gi
#> 7                    10Gi                      0                   <NA>
#> 8                    10Gi                   10Gi                   <NA>
#> 9                    30Gi                   30Gi                   <NA>
#> 10                   10Gi                      0                   <NA>
#>    hard.local.shared.50gi
#> 1                   200Gi
#> 2                    <NA>
#> 3                    <NA>
#> 4                    <NA>
#> 5                   200Gi
#> 6                    <NA>
#> 7                    <NA>
#> 8                    <NA>
#> 9                    <NA>
#> 10                   <NA>

Created on 2020-11-22 by the reprex package (v0.3.0)

Also, your original code works if you simply drop the second where().

df.rename <- df %>% 
  mutate(across(where(is.factor), as.character)) %>% 
  rename_with(~str_remove(., "storageclass\\.storage\\.k8s\\.io\\.requests\\.storage.*"), 
              contains("storageclass.storage.k8s.io.requests.storage"))

I should have included a few more variable names in my sample to show why I need the "contains". Otherwise this does work.

Thanks! Looks like this works. I think I was expecting that rename_with needed where in the same way the across functions use it. Validating some more but I think this is the solution.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.