Splitting a vector into different columns if there is a number

Hi there! I'm quite new with R, so this question might seem a bit dumb.
I have the following data frame, where i have at the start some text that would be the name of the variable and then five numbers separated by spaces, than represent different moments of time.

   metric1
1    Total Revenue 112 110 108 100 94
2   Total Revenue 534 523 514 500 485
3      Cost of Revenue 73 72 71 66 62
4 Cost of Revenue 403 394 385 373 361
5         Gross Profit 38 37 37 34 32
6    Gross Profit 131 129 129 126 124

There is any way i can split the column first betweeen text and numbers, and then by number in order to have something like this?

Metric Year1 Year2 Year3 Year4 Year5
Total Revenue 112 110 108 100 94
Total Revenue 534 523 514 500 485
Cost of Revenue 73 72 71 66 62
Cost of Revenue 403 394 385 373 361
Gross Profit 38 37 37 34 32
Gross Profit 131 129 129 126 124

Thank you for your time!

The tidyr package has great tools for data manipulation tasks such as this. We can achieve this by using the extract() and separate() functions that it contains.

library(tidyr)

data <- tibble(metric1 = c("Total Revenue 112 110 108 100 94",
                           "Total Revenue 534 523 514 500 485",
                           "Cost of Revenue 73 72 71 66 62",
                           "Cost of Revenue 403 394 385 373 361",
                           "Gross Profit 38 37 37 34 32",
                           "Gross Profit 131 129 129 126 124"))

data <- data %>% 
  extract(col = metric1, into = c("Metric", "Numbers"), regex = "(\\D+)(.+)") %>% 
  separate(col = Numbers, into = c("Year1", "Year2", "Year3", "Year4", "Year5"), sep = " ")

print(data)
#> # A tibble: 6 x 6
#>   Metric             Year1 Year2 Year3 Year4 Year5
#>   <chr>              <chr> <chr> <chr> <chr> <chr>
#> 1 "Total Revenue "   112   110   108   100   94   
#> 2 "Total Revenue "   534   523   514   500   485  
#> 3 "Cost of Revenue " 73    72    71    66    62   
#> 4 "Cost of Revenue " 403   394   385   373   361  
#> 5 "Gross Profit "    38    37    37    34    32   
#> 6 "Gross Profit "    131   129   129   126   124

Created on 2020-05-29 by the reprex package (v0.3.0)

2 Likes

What you have done is amazing! Thank you so much!
I know all the tools that you have used execpt these terms (\D+)(.+). I know what they mean but i'm not really used to add them into my code. Whats these arguments name? In order to look for them online and learn about them.

Thank you, you are amazing!

The (\\D+)(.+) is called a regular expression (or regex for short). These are typically used for pattern matching in strings.

\D, + and . are metacharacters in regex that each match a specific pattern. For example, \D matches any non-digit character.

If you'd like to learn more, this website has a excellent interactive tutorial on the basics of regular expressions. Hope you find it useful.

1 Like

Thank you, Sidd. I really appreciate your time, you are saving me tons of problems.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.