Renaming problem

budugulo · January 29, 2022, 9:33pm

id	country	name	new_name	p123	new_p123	p900	new_p900
7	UK	abc	ABC	1	100	2	4
8	US	def	DEF	2	101	3	5

In the above table, some variables have corresponding "new" variables. For example, name and new_name. How can I drop the old ones, and keep the new ones without the new_ prefix?

Here is my desired output:

id	country	name	p123	p900
7	UK	ABC	100	4
8	US	DEF	101	5

library(tidyverse)
# Toy data
df <- tibble(
  id = 7:8,
  country = c("UK", "US"),
  name = c("abc", "def"),
  new_name = c("ABC", "DEF"),
  p123 = 1:2,
  new_p123 = 100:101,
  p900 = 2:3,
  new_p900 = 4:5
)

technocrat · January 29, 2022, 9:50pm

DF <- data.frame(
  id = c(7, 8),
  country = c("UK", "US"),
  name = c("abc", "def"),
  new_name = c("ABC", "DEF"),
  p123 = c(1, 2),
  new_p123 = c(100, 101),
  p900 = c(2, 3),
  new_p900 = c(4, 5))

DF
#>   id country name new_name p123 new_p123 p900 new_p900
#> 1  7      UK  abc      ABC    1      100    2        4
#> 2  8      US  def      DEF    2      101    3        5

# directly

DF[,-c(6,8)]
#>   id country name new_name p123 p900
#> 1  7      UK  abc      ABC    1    2
#> 2  8      US  def      DEF    2    3

# for wider cases

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

DF %>% select(!starts_with("new"))
#>   id country name p123 p900
#> 1  7      UK  abc    1    2
#> 2  8      US  def    2    3

budugulo · January 29, 2022, 10:24pm

Thanks @technocrat ! Unfortunately, this is not what I wanted. Please see my desired output, values are all the new ones.

andresrcs · January 29, 2022, 10:45pm

library(tidyverse)
# Toy data
df <- tibble(
    id = 7:8,
    country = c("UK", "US"),
    name = c("abc", "def"),
    new_name = c("ABC", "DEF"),
    p123 = 1:2,
    new_p123 = 100:101,
    p900 = 2:3,
    new_p900 = 4:5
)

df %>% 
    select(id, country, starts_with("new")) %>% 
    rename_with(.fn = ~str_remove(.x, "^new_"),
                .cols = starts_with("new"))
#> # A tibble: 2 × 5
#>      id country name   p123  p900
#>   <int> <chr>   <chr> <int> <int>
#> 1     7 UK      ABC     100     4
#> 2     8 US      DEF     101     5

^{Created on 2022-01-29 by the reprex package (v2.0.1)}

technocrat · January 29, 2022, 10:52pm

Same logic, -c(3,5) or select(starts_with(“new”) …

budugulo · January 30, 2022, 8:04pm

@andresrcs Many thanks for the solution! One question: Why are you using .cols = starts_with("new")? Even without it, the code produces the wanted result.

andresrcs · January 31, 2022, 12:24am

To only apply the renaming function to the columns where is needed, the rest of the columns do not match the extrac pattern but the function is still applied to all of them if you do not restrict the selection. For a small amount of columns doesn't make much of a difference though.

budugulo · January 31, 2022, 1:54pm

@andresrcs Thanks for the explanation! I find it helpful.

I have another question: Suppose there are variables Name and name2, and corresponding new variables are nName and nname2. How would your solution change in this case? Put another way, instead of "new_", new variables's names start with "n".

library(tidyverse)
# Toy data
df <- tibble(
  id = 7:8,
  country = c("UK", "US"),
  Name = c("abc", "def"), 
  nName = c("ABC", "DEF"),
  name2 = c("xyz", "ijk"),
  nname2 = c("xyz", "ijk"),
  p123 = 1:2,
  np123 = 100:101,
  p900 = 2:3,
  np900 = 4:5
)

# What to do?
df %>% 
  select(id, country, starts_with("n")) %>% 
  rename_with(.fn = ~str_remove(.x, "^n"),
              .cols = starts_with("n"))
#> Error: Names must be unique.
#> x These names are duplicated:
#>   * "Name" at locations 3 and 4.

andresrcs · January 31, 2022, 2:16pm

You need the selection to be case sensitive, use starts_with("n", ignore.case = FALSE)

olyerickson · January 31, 2022, 4:22pm

Indeed, your code should look like this:

> df
# A tibble: 2 × 10
     id country Name  nName name2 nname2  p123 np123  p900 np900
  <int> <chr>   <chr> <chr> <chr> <chr>  <int> <int> <int> <int>
1     7 UK      abc   ABC   xyz   xyz        1   100     2     4
2     8 US      def   DEF   ijk   ijk        2   101     3     5
> 
> df %>% select(id, country, starts_with("n",ignore.case = FALSE)) %>%
+   rename_with(.fn = ~str_remove(.x, "^n"), 
+               .cols = starts_with("n",ignore.case = FALSE))
# A tibble: 2 × 7
     id country Name  ame2  name2  p123  p900
  <int> <chr>   <chr> <chr> <chr> <int> <int>
1     7 UK      ABC   xyz   xyz     100     4
2     8 US      DEF   ijk   ijk     101     5

system · February 7, 2022, 4:23pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.