I need to convert cross-sectional data to time series. I have no idea where to start, I have the Annual Turnover for companies for the last 5 years and I need to fit these values in a trend line.
Can you please share a small part of the data set in a copy-paste friendly format?
In case you don't know how to do it, there are many options, which include:
Could you apply the dput()
function to the output of this command:
and then place the
dput()
output here between a pair of triple backticks (```), like this?
```
[paste output of dput(head(newCD_Turnover...) here]
```
And maybe use 50 instead of 5 in the head()
command?
'''
[structure(list(TurnoverGBP2019 = c(6.3911e+10, 2.9007e+10, 1.7735e+10,
1.5824e+10, 4763100000, 4167400000, 3213243000, 2957400000, 2221400000,
1878900000, 1818793000, 1.632e+09, 1467900000, 1437100000, 1058700000,
9.79e+08, 9.04e+08, 804438000, 7.24e+08, 488500000, 4.36e+08,
384400000, 382830000, 333600000, 316300000, 2.94e+08, 2.79e+08,
277016000, 250700000, 219197000, 208952000, 179300000, 156161000,
129894000, 125414000, 89215000, 66358000, 59669000, 31189000,
28187000, 25417000, 25162000, 23268000, 22274000, 22263000, 21621000,
19511000, 19033000, 17183000, 13742000), TurnoverGBP2018 = c(5.7491e+10,
2.8456e+10, 1.7262e+10, 1.5574e+10, 4874800000, 4055500000, 2957664000,
2703700000, 2203700000, 3226800000, 1693818000, 1.708e+09, 1534300000,
1464500000, 991200000, 9.6e+08, 6.39e+08, 792872000, 6.95e+08,
470300000, 422100000, 3.8e+08, 303639000, 296900000, 285700000,
3.26e+08, 277700000, 172543000, 233200000, 216887000, 197632000,
165500000, 142885000, 120548000, 116660000, 79781000, 58292000,
60487000, 30178000, 30663000, 20454000, 23207000, 19759000, 21419000,
22317000, 10418000, 18800000, 16137000, 12330000, 12720000),
TurnoverGBP2017 = c(5.5917e+10, 2.6224e+10, 1.6317e+10, 1.5357e+10,
4650200000, 4097300000, 2558561000, 2723500000, 2126300000,
3941200000, 1660750000, 1.783e+09, 1594300000, 1245058000,
8.22e+08, 9.33e+08, 5.89e+08, 740290000, 6.48e+08, 459100000,
398200000, 352100000, 248740000, 348500000, 257200000, 290200000,
257100000, 244111000, 244500000, 211848000, 186512000, 163200000,
140738000, 113968000, 109070000, 60521000, 56369000, 46120000,
19392000, 27796000, 19872000, 21902000, 17593000, 19854000,
21963000, NA, 16160000, 15563000, 12503000, 11179000)), row.names = c(NA,
-50L), class = c("tbl_df", "tbl", "data.frame"))]
'''
I don't understand why the format won't change
Very close! (And good enough for now ) That's because you used a triple single quotes (''') instead of triple backticks (``` -- on the same key as ~ in the upper left of the keyboard).
Oh, and the opening and closing [
and ]
aren't needed -- I just put them there to contain the instructions on what to do.
What do your rows represent? And could you describe an example of a time series you'd like to see?
The rows represent different companies. I need to convert the turnover values to a time series variable, calculate the trend line and their respective R-squares then take the values that I get for each firm and put it back into my original dataset. Does that make sense?
Are you trying to create an individual time series for each company?
Yes unfortunately I am, so I have create a trend line for each company
Not a problem, just wanted to make sure. I can't help right now, but will post again when I can...
Brilliant, thank you so much!
OK, another question: With so many companies, I assume you might prefer a table you can use to plot trendlines for selections of companies, so I thought I'd help with a time-series-shaped table first:
df <-
structure(list(TurnoverGBP2019 = c(6.3911e+10, 2.9007e+10, 1.7735e+10,
1.5824e+10, 4763100000, 4167400000), TurnoverGBP2018 = c(5.7491e+10,
2.8456e+10, 1.7262e+10, 1.5574e+10, 4874800000, 4055500000),
TurnoverGBP2017 = c(5.5917e+10, 2.6224e+10, 1.6317e+10, 1.5357e+10,
4650200000, 4097300000)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
### end of 'structure()' command
# inspect
df
#> TurnoverGBP2019 TurnoverGBP2018 TurnoverGBP2017
#> 1 6.3911e+10 5.7491e+10 5.5917e+10
#> 2 2.9007e+10 2.8456e+10 2.6224e+10
#> 3 1.7735e+10 1.7262e+10 1.6317e+10
#> 4 1.5824e+10 1.5574e+10 1.5357e+10
#> 5 4.7631e+09 4.8748e+09 4.6502e+09
#> 6 4.1674e+09 4.0555e+09 4.0973e+09
# load tools used below
library(tidyverse)
# pivot to longer form
df.long <-
df %>%
mutate(id = row_number()) %>%
pivot_longer(-id, names_to = 'year')
df.long %>% head()
#> # A tibble: 6 x 3
#> id year value
#> <int> <chr> <dbl>
#> 1 1 TurnoverGBP2019 63911000000
#> 2 1 TurnoverGBP2018 57491000000
#> 3 1 TurnoverGBP2017 55917000000
#> 4 2 TurnoverGBP2019 29007000000
#> 5 2 TurnoverGBP2018 28456000000
#> 6 2 TurnoverGBP2017 26224000000
# time-series form (not a time-series object, though)
df.ts <-
df.long %>%
mutate(year = parse_number(year)) %>%
arrange(year) %>%
pivot_wider(names_from = id, names_prefix = 'Co.')
df.ts %>% select(1:4)
#> # A tibble: 3 x 4
#> year Co.1 Co.2 Co.3
#> <dbl> <dbl> <dbl> <dbl>
#> 1 2017 55917000000 26224000000 16317000000
#> 2 2018 57491000000 28456000000 17262000000
#> 3 2019 63911000000 29007000000 17735000000
Created on 2020-03-09 by the reprex package (v0.3.0)
Does that help?
That helps so much, one further question if you don't mind me asking, how would I then go about creating trend lines for every company?
Since there are so many companies, that depends -- how would you want the data displayed?
Keep the dataframe on a long format (i.e. before the last pivot wider) then nest the data by company and create the linear models using the purrr:map()
function.
If you need more specific help about this I recommend you to open a new topic since it is a different question and we like to keep things tidy around here. Don't forget to include a reprex for this new issue.
I would want it as a column next to the turnover values in the original dataset
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.