@technocrat: I knew that using a regular expression was the solution, but it is not something that I have taken the time to learn. With my retirement at the end of fall term, I finally started down that path a couple of weeks ago! Thanks for the example.
Thanks so much for the detailed steps! My goal is actually to keep school_1, year_1... and remove those with _2 and higher. I am sorry, if it was not clear in the description. I look forward to your reply.
This seems to be the way to solve this. The only thing is that my target columns are those with *_1 e.g. school_1 and year_1 (not the higher ones e.g. school_2 or year_3) as well as the ones that do not have that underscore pattern.
Either as a function in a self-designed utility, package match_1() or, since it is an object, and objects can be serialized `saveRDS(pat, file = "somewhere_convenient.Rds").
First, make up a data set. In fact there are two example data sets below, a matrix and a data.frame. The code to get the columns ending with a number after an underscore is the same for both. Just substitute df1 for mat1.
# make up a data set
mat1 <- matrix(1:(7*4), ncol = 7)
colnames(mat1) <- c("state", " education", " school_1", " school_2",
" school_3", " year_1", " year_2")
# show that it works with data.frames too
df1 <- as.data.frame(mat1)
".*_" any character repeated any number of times followed by an underscore;
the capture group "(\\d+$)" is a digit repeated at least once until the end ($)
To substitute this pattern by the first (\\1) and only capture group effectively removes everything else keeping only the numbers. Coerce the numbers to numeric and test for > 1. But be careful, if the search pattern doesn't exist in the column names the coercion will return NA's so test for them too.
The index gives columns with numbers greater than one after the underscore and to negate it gives the wanted columns.
# keep only the numbers after an underscore and coerce to numeric
i_col <- as.numeric(sub(".*_(\\d+$)", "\\1", colnames(mat1)))
#> Warning: NAs introduced by coercion
# this is the logical index giving the answer
i_col <- !is.na(i_col) & i_col > 1
mat1[, !i_col]
#> state education school_1 year_1
#> [1,] 1 5 9 21
#> [2,] 2 6 10 22
#> [3,] 3 7 11 23
#> [4,] 4 8 12 24
df1[, !i_col]
#> state education school_1 year_1
#> 1 1 5 9 21
#> 2 2 6 10 22
#> 3 3 7 11 23
#> 4 4 8 12 24