Converting Cross-Sectional Data to Time Series

I need to convert cross-sectional data to time series. I have no idea where to start, I have the Annual Turnover for companies for the last 5 years and I need to fit these values in a trend line.

Can you please share a small part of the data set in a copy-paste friendly format?

In case you don't know how to do it, there are many options, which include:

  1. If you have stored the data set in some R object, dput function is very handy.

  2. In case the data set is in a spreadsheet, check out the datapasta package. Take a look at this link.

Could you apply the dput() function to the output of this command:


and then place the dput() output here between a pair of triple backticks (```), like this?

```
[paste output of dput(head(newCD_Turnover...) here]
```

And maybe use 50 instead of 5 in the head() command?

'''
[structure(list(TurnoverGBP2019 = c(6.3911e+10, 2.9007e+10, 1.7735e+10,
1.5824e+10, 4763100000, 4167400000, 3213243000, 2957400000, 2221400000,
1878900000, 1818793000, 1.632e+09, 1467900000, 1437100000, 1058700000,
9.79e+08, 9.04e+08, 804438000, 7.24e+08, 488500000, 4.36e+08,
384400000, 382830000, 333600000, 316300000, 2.94e+08, 2.79e+08,
277016000, 250700000, 219197000, 208952000, 179300000, 156161000,
129894000, 125414000, 89215000, 66358000, 59669000, 31189000,
28187000, 25417000, 25162000, 23268000, 22274000, 22263000, 21621000,
19511000, 19033000, 17183000, 13742000), TurnoverGBP2018 = c(5.7491e+10,
2.8456e+10, 1.7262e+10, 1.5574e+10, 4874800000, 4055500000, 2957664000,
2703700000, 2203700000, 3226800000, 1693818000, 1.708e+09, 1534300000,
1464500000, 991200000, 9.6e+08, 6.39e+08, 792872000, 6.95e+08,
470300000, 422100000, 3.8e+08, 303639000, 296900000, 285700000,
3.26e+08, 277700000, 172543000, 233200000, 216887000, 197632000,
165500000, 142885000, 120548000, 116660000, 79781000, 58292000,
60487000, 30178000, 30663000, 20454000, 23207000, 19759000, 21419000,
22317000, 10418000, 18800000, 16137000, 12330000, 12720000),
TurnoverGBP2017 = c(5.5917e+10, 2.6224e+10, 1.6317e+10, 1.5357e+10,
4650200000, 4097300000, 2558561000, 2723500000, 2126300000,
3941200000, 1660750000, 1.783e+09, 1594300000, 1245058000,
8.22e+08, 9.33e+08, 5.89e+08, 740290000, 6.48e+08, 459100000,
398200000, 352100000, 248740000, 348500000, 257200000, 290200000,
257100000, 244111000, 244500000, 211848000, 186512000, 163200000,
140738000, 113968000, 109070000, 60521000, 56369000, 46120000,
19392000, 27796000, 19872000, 21902000, 17593000, 19854000,
21963000, NA, 16160000, 15563000, 12503000, 11179000)), row.names = c(NA,
-50L), class = c("tbl_df", "tbl", "data.frame"))]
'''

I don't understand why the format won't change

Very close! (And good enough for now :slight_smile: ) That's because you used a triple single quotes (''') instead of triple backticks (``` -- on the same key as ~ in the upper left of the keyboard).

Oh, and the opening and closing [ and ] aren't needed -- I just put them there to contain the instructions on what to do.

What do your rows represent? And could you describe an example of a time series you'd like to see?

The rows represent different companies. I need to convert the turnover values to a time series variable, calculate the trend line and their respective R-squares then take the values that I get for each firm and put it back into my original dataset. Does that make sense?

Are you trying to create an individual time series for each company?

Yes unfortunately I am, so I have create a trend line for each company

Not a problem, just wanted to make sure. I can't help right now, but will post again when I can...

Brilliant, thank you so much!

OK, another question: With so many companies, I assume you might prefer a table you can use to plot trendlines for selections of companies, so I thought I'd help with a time-series-shaped table first:

df <- 
  structure(list(TurnoverGBP2019 = c(6.3911e+10, 2.9007e+10, 1.7735e+10, 
                                     1.5824e+10, 4763100000, 4167400000), TurnoverGBP2018 = c(5.7491e+10, 
                                                                                              2.8456e+10, 1.7262e+10, 1.5574e+10, 4874800000, 4055500000), 
                 TurnoverGBP2017 = c(5.5917e+10, 2.6224e+10, 1.6317e+10, 1.5357e+10, 
                                     4650200000, 4097300000)), row.names = c(NA, -6L), class = c("tbl_df", 
                                                                                                 "tbl", "data.frame"))
### end of 'structure()' command

# inspect
df
#>   TurnoverGBP2019 TurnoverGBP2018 TurnoverGBP2017
#> 1      6.3911e+10      5.7491e+10      5.5917e+10
#> 2      2.9007e+10      2.8456e+10      2.6224e+10
#> 3      1.7735e+10      1.7262e+10      1.6317e+10
#> 4      1.5824e+10      1.5574e+10      1.5357e+10
#> 5      4.7631e+09      4.8748e+09      4.6502e+09
#> 6      4.1674e+09      4.0555e+09      4.0973e+09

# load tools used below
library(tidyverse)

# pivot to longer form
df.long <- 
  df %>% 
  mutate(id = row_number()) %>% 
  pivot_longer(-id, names_to = 'year')
df.long %>% head()
#> # A tibble: 6 x 3
#>      id year                  value
#>   <int> <chr>                 <dbl>
#> 1     1 TurnoverGBP2019 63911000000
#> 2     1 TurnoverGBP2018 57491000000
#> 3     1 TurnoverGBP2017 55917000000
#> 4     2 TurnoverGBP2019 29007000000
#> 5     2 TurnoverGBP2018 28456000000
#> 6     2 TurnoverGBP2017 26224000000


# time-series form (not a time-series object, though)
df.ts <- 
  df.long %>% 
  mutate(year = parse_number(year)) %>% 
  arrange(year) %>% 
  pivot_wider(names_from = id, names_prefix = 'Co.')
df.ts %>% select(1:4)
#> # A tibble: 3 x 4
#>    year        Co.1        Co.2        Co.3
#>   <dbl>       <dbl>       <dbl>       <dbl>
#> 1  2017 55917000000 26224000000 16317000000
#> 2  2018 57491000000 28456000000 17262000000
#> 3  2019 63911000000 29007000000 17735000000

Created on 2020-03-09 by the reprex package (v0.3.0)

Does that help?

That helps so much, one further question if you don't mind me asking, how would I then go about creating trend lines for every company?

Since there are so many companies, that depends -- how would you want the data displayed?

Keep the dataframe on a long format (i.e. before the last pivot wider) then nest the data by company and create the linear models using the purrr:map() function.

If you need more specific help about this I recommend you to open a new topic since it is a different question and we like to keep things tidy around here. Don't forget to include a reprex for this new issue.

I would want it as a column next to the turnover values in the original dataset

Thank you @andresrcs I'll open a new one now

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.