Removing columns by index numbers

Hi,

I have a dataset with column names such as state, education, school_1, school_2, school_3, year_1, year_2...

I want to remove the columns where the number after the underscore is higher than 1.

Please let me know is there is any way to achieve this.

Thank you!

1 Like

select(yourdata, -ends_with("_1")) should do it

Hi,

Thanks so much for your reply.

This method removes the columns that end with _1 and keeps the higher ones (e.g. _2 and _3), which is the opposite of what I want to have!

Also, when I try

select(dd, !ends_with("_1"))

it keeps those with '_1', but removes (unwantedly) columns that do not contain an underscore.

Sorry, I misread your first post!

Would select(dd, !(ends_with("2") | ends_with("3"))) work?

# setup to demonstrate one solution
varnames <- c("state", "education", "school_1", "school_2", "school_3", "year_1", "year_2")
length(varnames)
#> [1] 7
dumb <- mtcars[1:6,1:7]
colnames(dumb) <- varnames
dumb
#>                   state education school_1 school_2 school_3 year_1 year_2
#> Mazda RX4          21.0         6      160      110     3.90  2.620  16.46
#> Mazda RX4 Wag      21.0         6      160      110     3.90  2.875  17.02
#> Datsun 710         22.8         4      108       93     3.85  2.320  18.61
#> Hornet 4 Drive     21.4         6      258      110     3.08  3.215  19.44
#> Hornet Sportabout  18.7         8      360      175     3.15  3.440  17.02
#> Valiant            18.1         6      225      105     2.76  3.460  20.22

# solution
# regular expression to match colnames indexes ending in _1
pat <- "^.*_[1]$"
# subset by indexes of columns that do not match pat
dumb[grep(pat,colnames(dumb),invert = TRUE)]
#>                   state education school_2 school_3 year_2
#> Mazda RX4          21.0         6      110     3.90  16.46
#> Mazda RX4 Wag      21.0         6      110     3.90  17.02
#> Datsun 710         22.8         4       93     3.85  18.61
#> Hornet 4 Drive     21.4         6      110     3.08  19.44
#> Hornet Sportabout  18.7         8      175     3.15  17.02
#> Valiant            18.1         6      105     2.76  20.22

Created on 2022-12-26 with reprex v2.0.2

2 Likes

@technocrat: I knew that using a regular expression was the solution, but it is not something that I have taken the time to learn. With my retirement at the end of fall term, I finally started down that path a couple of weeks ago! Thanks for the example.

1 Like

In retirement you’ll wonder how you ever had time to go to work.

3 Likes

Hi,

Thanks so much for the detailed steps! My goal is actually to keep school_1, year_1... and remove those with _2 and higher. I am sorry, if it was not clear in the description. I look forward to your reply.

Thank you.

Hi,

This seems to be the way to solve this. The only thing is that my target columns are those with *_1 e.g. school_1 and year_1 (not the higher ones e.g. school_2 or year_3) as well as the ones that do not have that underscore pattern.

I look forward to your kind reply.

Thank you!

should be


dumb[grep(pat,colnames(dumb))]

varnames <- c("state", "education", "school_1", "school_2", "school_3", "year_1", "year_2")
dumb <- mtcars[1:6,1:7]
colnames(dumb) <- varnames

# solution
# regular expression to match colnames indexes ending in _1 or a letter
pat <- "^.*_[1]$|[a-z]$"
# subset by indexes of columns that do not match pat
dumb[grep(pat,colnames(dumb))]
#>                   state education school_1 year_1
#> Mazda RX4          21.0         6      160  2.620
#> Mazda RX4 Wag      21.0         6      160  2.875
#> Datsun 710         22.8         4      108  2.320
#> Hornet 4 Drive     21.4         6      258  3.215
#> Hornet Sportabout  18.7         8      360  3.440
#> Valiant            18.1         6      225  3.460
1 Like

That's a great piece of poetry! Thanks so much!

I'd like to ask an additional question that you might know. Is there any way to save the 'pat' function permanently?

Either as a function in a self-designed utility, package match_1() or, since it is an object, and objects can be serialized `saveRDS(pat, file = "somewhere_convenient.Rds").

1 Like

Hello,

First, make up a data set. In fact there are two example data sets below, a matrix and a data.frame. The code to get the columns ending with a number after an underscore is the same for both. Just substitute df1 for mat1.

# make up a data set
mat1 <- matrix(1:(7*4), ncol = 7)
colnames(mat1) <- c("state", " education", " school_1", " school_2", 
                    " school_3", " year_1", " year_2")
# show that it works with data.frames too
df1 <- as.data.frame(mat1)

Created on 2022-12-29 with reprex v2.0.2

Now, the regular expression means:

  • ".*_" any character repeated any number of times followed by an underscore;

  • the capture group "(\\d+$)" is a digit repeated at least once until the end ($)

To substitute this pattern by the first (\\1) and only capture group effectively removes everything else keeping only the numbers. Coerce the numbers to numeric and test for > 1. But be careful, if the search pattern doesn't exist in the column names the coercion will return NA's so test for them too.

The index gives columns with numbers greater than one after the underscore and to negate it gives the wanted columns.

# keep only the numbers after an underscore and coerce to numeric
i_col <- as.numeric(sub(".*_(\\d+$)", "\\1", colnames(mat1)))
#> Warning: NAs introduced by coercion
# this is the logical index giving the answer
i_col <- !is.na(i_col) & i_col > 1

mat1[, !i_col]
#>      state  education  school_1  year_1
#> [1,]     1          5         9      21
#> [2,]     2          6        10      22
#> [3,]     3          7        11      23
#> [4,]     4          8        12      24
df1[, !i_col]
#>   state  education  school_1  year_1
#> 1     1          5         9      21
#> 2     2          6        10      22
#> 3     3          7        11      23
#> 4     4          8        12      24

Created on 2022-12-29 with reprex v2.0.2

1 Like

This is fantastic! Thanks so much for the step-by-step descriptions. I learned a lot from your answer!
Happy new year!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.