Variables with brackets

Hi,

I have several variables which include brackets and texts within it. However, I would like to remove all these contents along with brackets. Is there a way to do this with tidyverse? Examples of variable names:

2020 - 1 (N = 1211)
2019 - 2 (N = 1191)
2019 - 1 (N = 1234)

Basically, I would like to get rid of everything starting from "("

Thank you!

Do you really want to leave the trailing space character that was just before the (?

library(stringr)
TEXT <- c("2020 - 1 (N = 1211)",
"2019 - 2 (N = 1191)",
"2019 - 1 (N = 1234)")
NewTEXT <- str_replace(TEXT, pattern = "\\([^\\)]+\\)", replacement = "")
NewTEXT
#> [1] "2020 - 1 " "2019 - 2 " "2019 - 1 "

Created on 2020-09-09 by the reprex package (v0.3.0)

Thanks you! Yes, I don't want the trailing space character before bracket.
Also, can you please help me read this pattern as to how it works.

Here is a modified version that removes the trailing space and has been simplified a little.

NewTEXT <- str_replace(TEXT, pattern = " \\([^)]+\\)", replacement = "")

The pattern argument is a regular expression, which allows defining patterns in text. It looks complicated at first but can be understood by explaining each piece.

In a regular expression, placing text within [^ ] means you want to match any character that is not included within the brackets. So [^)] represents any character that is not a ).

Placing a + after a character means "one or more of the preceding character". So [^)]+ represents one or more characters that are not ).

What we want to search for is a space, followed by (, followed by one or more characters that are not ), followed by ). You might think that would look like
" ([^)]+)"
However parentheses are special characters within a regular expression. They are used to make groups of text. To prevent the parentheses from being treated as a special character, they must be preceded by two back slashes. That makes the final regular expression
" \\([^)]+\\)"

(Outside of R, it is sufficient to precede special characters in a regular expression by one back slash.)

Learning regular expressions is very helpful in many programming situations.

1 Like

Perfect! Thank you so much!

Hi,

Because I have several variables, so I kept all those variables in cols as below and it doesn't work the way it should. What am I doing wrong here while including all the variables.

cols <- c(2:10)
df[cols] <- str_replace(cols, pattern = " \\([^)]+\\)", replacement = "")

Thank you!

The problem is that you are asking str_replace to act on the cols vector, not on any of the columns of df. Try running this code that acts on columns 2:4 of the data frame. I used the mutate function combined with across() to pick which columns are affected.

library(dplyr)
library(stringr)
TEXT <- c("2020 - 1 (N = 1211)",
          "2019 - 2 (N = 1191)",
          "2019 - 1 (N = 1234)")
DF <- data.frame(Name = c("A", "B", "C"), 
                 C1 = TEXT,
                 C2 = TEXT,
                 C3 = TEXT)
DF
DF <- DF %>% mutate(across(.cols = 2:4, 
                     .fns = ~ str_replace(., pattern = "\\([^\\)]+\\)", replacement = "")))
DF

Sorry, I think I was not clear earlier.
I tried to create a reprex for this as follows with only few variables:


df <- data.frame(
       stringsAsFactors = FALSE,
            check.names = FALSE,
                  Brand = c("A", "B", "C"),
                               `Total (N = 9719)` = c("0.67400000000000004",
                                                      "0.56999999999999995",
                                                      "0.50700000000000001"),
                            `2020 - 1 (N = 1211)` = c("0.63200000000000001",
                                                      "0.57699999999999996",
                                                      "0.46500000000000002"),
                            `2019 - 2 (N = 1191)` = c("0.67900000000000005",
                                                      "0.56799999999999995",
                                                      "0.48")
                          )

It seems like your example is working on values per variable where C1, C2, and C3 are the variable names. I am trying to remove the bracket with text in the variable names itself. So, the name of C2 is actually 2020 - 1 (N=1211) in my case and the name of C3 is 2019 - 2 (N = 1191) and so on.

Can we still remove the brackets and text within it from the variable names itself?

Thanks for all your help!

Here is an example using the data frame you provided.

library(stringr)
df <- data.frame(
  stringsAsFactors = FALSE,
  check.names = FALSE,
  Brand = c("A", "B", "C"),
  `Total (N = 9719)` = c("0.67400000000000004",
                         "0.56999999999999995",
                         "0.50700000000000001"),
  `2020 - 1 (N = 1211)` = c("0.63200000000000001",
                            "0.57699999999999996",
                            "0.46500000000000002"),
  `2019 - 2 (N = 1191)` = c("0.67900000000000005",
                            "0.56799999999999995",
                            "0.48")
)

Nms <- colnames(df)
colnames(df) <- str_replace(Nms, pattern = "\\([^\\)]+\\)", replacement = "")
colnames(df)

Yes, it works! Thank you so much!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.