Hi there! I'm quite new with R, so this question might seem a bit dumb.
I have the following data frame, where i have at the start some text that would be the name of the variable and then five numbers separated by spaces, than represent different moments of time.
metric1
1 Total Revenue 112 110 108 100 94
2 Total Revenue 534 523 514 500 485
3 Cost of Revenue 73 72 71 66 62
4 Cost of Revenue 403 394 385 373 361
5 Gross Profit 38 37 37 34 32
6 Gross Profit 131 129 129 126 124
There is any way i can split the column first betweeen text and numbers, and then by number in order to have something like this?
Metric
Year1
Year2
Year3
Year4
Year5
Total Revenue
112
110
108
100
94
Total Revenue
534
523
514
500
485
Cost of Revenue
73
72
71
66
62
Cost of Revenue
403
394
385
373
361
Gross Profit
38
37
37
34
32
Gross Profit
131
129
129
126
124
Thank you for your time!
The tidyr
package has great tools for data manipulation tasks such as this. We can achieve this by using the extract()
and separate()
functions that it contains.
library(tidyr)
data <- tibble(metric1 = c("Total Revenue 112 110 108 100 94",
"Total Revenue 534 523 514 500 485",
"Cost of Revenue 73 72 71 66 62",
"Cost of Revenue 403 394 385 373 361",
"Gross Profit 38 37 37 34 32",
"Gross Profit 131 129 129 126 124"))
data <- data %>%
extract(col = metric1, into = c("Metric", "Numbers"), regex = "(\\D+)(.+)") %>%
separate(col = Numbers, into = c("Year1", "Year2", "Year3", "Year4", "Year5"), sep = " ")
print(data)
#> # A tibble: 6 x 6
#> Metric Year1 Year2 Year3 Year4 Year5
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 "Total Revenue " 112 110 108 100 94
#> 2 "Total Revenue " 534 523 514 500 485
#> 3 "Cost of Revenue " 73 72 71 66 62
#> 4 "Cost of Revenue " 403 394 385 373 361
#> 5 "Gross Profit " 38 37 37 34 32
#> 6 "Gross Profit " 131 129 129 126 124
Created on 2020-05-29 by the reprex package (v0.3.0)
2 Likes
What you have done is amazing! Thank you so much!
I know all the tools that you have used execpt these terms (\D+)(.+). I know what they mean but i'm not really used to add them into my code. Whats these arguments name? In order to look for them online and learn about them.
Thank you, you are amazing!
The (\\D+)(.+)
is called a regular expression (or regex for short). These are typically used for pattern matching in strings.
\D
, +
and .
are metacharacters in regex that each match a specific pattern. For example, \D
matches any non-digit character.
If you'd like to learn more, this website has a excellent interactive tutorial on the basics of regular expressions. Hope you find it useful.
1 Like
Thank you, Sidd. I really appreciate your time, you are saving me tons of problems.
1 Like
system
Closed
June 5, 2020, 1:52pm
6
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.