Hello,
First, make up a data set. In fact there are two example data sets below, a matrix and a data.frame. The code to get the columns ending with a number after an underscore is the same for both. Just substitute df1 for mat1.
# make up a data set
mat1 <- matrix(1:(7*4), ncol = 7)
colnames(mat1) <- c("state", " education", " school_1", " school_2",
" school_3", " year_1", " year_2")
# show that it works with data.frames too
df1 <- as.data.frame(mat1)
Created on 2022-12-29 with reprex v2.0.2
Now, the regular expression means:
To substitute this pattern by the first (\\1
) and only capture group effectively removes everything else keeping only the numbers. Coerce the numbers to numeric and test for > 1
. But be careful, if the search pattern doesn't exist in the column names the coercion will return NA's so test for them too.
The index gives columns with numbers greater than one after the underscore and to negate it gives the wanted columns.
# keep only the numbers after an underscore and coerce to numeric
i_col <- as.numeric(sub(".*_(\\d+$)", "\\1", colnames(mat1)))
#> Warning: NAs introduced by coercion
# this is the logical index giving the answer
i_col <- !is.na(i_col) & i_col > 1
mat1[, !i_col]
#> state education school_1 year_1
#> [1,] 1 5 9 21
#> [2,] 2 6 10 22
#> [3,] 3 7 11 23
#> [4,] 4 8 12 24
df1[, !i_col]
#> state education school_1 year_1
#> 1 1 5 9 21
#> 2 2 6 10 22
#> 3 3 7 11 23
#> 4 4 8 12 24
Created on 2022-12-29 with reprex v2.0.2