shorten column names in data frame

Hi, I have a large matrix of data with column names like this:


I would like to short these to


Unfortunately the second part (123A) is 4 or 5 characters so I can't cut it by length?
Is there a way to use gsub to sub everything from the second - for ""? Or another solution?

(example_matrix <-structure(c(1, 2), .Dim = 1:2, .Dimnames = list(NULL, c("ABCD-123A-1234-AB1AB1", 

(long_names <- colnames(example_matrix))

(short_names <- lapply(
  X = strsplit(x = long_names, split = "-"),
  FUN = function(x) paste0(head(x, n = 2), collapse = "_")))
colnames(example_matrix) <- short_names


Another option is to use regular expressions, a less readable but more direct approach.

example_matrix <- structure(c(1, 2), .Dim = 1:2, .Dimnames = list(NULL, c("ABCD-123A-1234-AB1AB1", 

colnames(example_matrix) <- regmatches(colnames(example_matrix), regexpr("^.{4}-[^-]{4,5}", colnames(example_matrix)))

#>      ABCD-123A ABCD-123AX
#> [1,]         1          2

Created on 2022-07-26 by the reprex package (v2.0.1)


Thanks so much this worked! I was wondering if you could break down how this works so I can alter it for future use? I'm guessing split is where to cut and the n=2 is saying split at the 2nd instance? But I'm not sure about the collapse part? Or if you have a good resource instead on this that would be great

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.