Renaming problem

id country name new_name p123 new_p123 p900 new_p900
7 UK abc ABC 1 100 2 4
8 US def DEF 2 101 3 5

In the above table, some variables have corresponding "new" variables. For example, name and new_name. How can I drop the old ones, and keep the new ones without the new_ prefix?

Here is my desired output:

id country name p123 p900
7 UK ABC 100 4
8 US DEF 101 5
library(tidyverse)
# Toy data
df <- tibble(
  id = 7:8,
  country = c("UK", "US"),
  name = c("abc", "def"),
  new_name = c("ABC", "DEF"),
  p123 = 1:2,
  new_p123 = 100:101,
  p900 = 2:3,
  new_p900 = 4:5
)
DF <- data.frame(
  id = c(7, 8),
  country = c("UK", "US"),
  name = c("abc", "def"),
  new_name = c("ABC", "DEF"),
  p123 = c(1, 2),
  new_p123 = c(100, 101),
  p900 = c(2, 3),
  new_p900 = c(4, 5))

DF
#>   id country name new_name p123 new_p123 p900 new_p900
#> 1  7      UK  abc      ABC    1      100    2        4
#> 2  8      US  def      DEF    2      101    3        5

# directly

DF[,-c(6,8)]
#>   id country name new_name p123 p900
#> 1  7      UK  abc      ABC    1    2
#> 2  8      US  def      DEF    2    3

# for wider cases

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

DF %>% select(!starts_with("new"))
#>   id country name p123 p900
#> 1  7      UK  abc    1    2
#> 2  8      US  def    2    3

Thanks @technocrat ! Unfortunately, this is not what I wanted. Please see my desired output, values are all the new ones.

library(tidyverse)
# Toy data
df <- tibble(
    id = 7:8,
    country = c("UK", "US"),
    name = c("abc", "def"),
    new_name = c("ABC", "DEF"),
    p123 = 1:2,
    new_p123 = 100:101,
    p900 = 2:3,
    new_p900 = 4:5
)

df %>% 
    select(id, country, starts_with("new")) %>% 
    rename_with(.fn = ~str_remove(.x, "^new_"),
                .cols = starts_with("new"))
#> # A tibble: 2 Γ— 5
#>      id country name   p123  p900
#>   <int> <chr>   <chr> <int> <int>
#> 1     7 UK      ABC     100     4
#> 2     8 US      DEF     101     5

Created on 2022-01-29 by the reprex package (v2.0.1)

Same logic, -c(3,5) or select(starts_with(β€œnew”) …

1 Like

@andresrcs Many thanks for the solution! One question: Why are you using .cols = starts_with("new")? Even without it, the code produces the wanted result.

To only apply the renaming function to the columns where is needed, the rest of the columns do not match the extrac pattern but the function is still applied to all of them if you do not restrict the selection. For a small amount of columns doesn't make much of a difference though.

1 Like

@andresrcs Thanks for the explanation! I find it helpful.

I have another question: Suppose there are variables Name and name2, and corresponding new variables are nName and nname2. How would your solution change in this case? Put another way, instead of "new_", new variables's names start with "n".

library(tidyverse)
# Toy data
df <- tibble(
  id = 7:8,
  country = c("UK", "US"),
  Name = c("abc", "def"), 
  nName = c("ABC", "DEF"),
  name2 = c("xyz", "ijk"),
  nname2 = c("xyz", "ijk"),
  p123 = 1:2,
  np123 = 100:101,
  p900 = 2:3,
  np900 = 4:5
)

# What to do?
df %>% 
  select(id, country, starts_with("n")) %>% 
  rename_with(.fn = ~str_remove(.x, "^n"),
              .cols = starts_with("n"))
#> Error: Names must be unique.
#> x These names are duplicated:
#>   * "Name" at locations 3 and 4.

You need the selection to be case sensitive, use starts_with("n", ignore.case = FALSE)

1 Like

Indeed, your code should look like this:

> df
# A tibble: 2 Γ— 10
     id country Name  nName name2 nname2  p123 np123  p900 np900
  <int> <chr>   <chr> <chr> <chr> <chr>  <int> <int> <int> <int>
1     7 UK      abc   ABC   xyz   xyz        1   100     2     4
2     8 US      def   DEF   ijk   ijk        2   101     3     5
> 
> df %>% select(id, country, starts_with("n",ignore.case = FALSE)) %>%
+   rename_with(.fn = ~str_remove(.x, "^n"), 
+               .cols = starts_with("n",ignore.case = FALSE))
# A tibble: 2 Γ— 7
     id country Name  ame2  name2  p123  p900
  <int> <chr>   <chr> <chr> <chr> <int> <int>
1     7 UK      ABC   xyz   xyz     100     4
2     8 US      DEF   ijk   ijk     101     5

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.