Need help with dividing data into Panels in R

Greetings Friends

I have this problem, i downloaded a dataset from yahoo finance,..you can find it here
https://finance.yahoo.com/quote/^GSPC/history?p=^GSPC
So I have downloaded the data for the period 2013 to 2018,..and i want to analyse the data in panels as follows:

Panel A: January 2013 - December 2013
Panel B: January 2014 - December 2014
..
..
..
Panel F: January 2018 - December 2018

I can divide the data in excel and load the Panels differently, but that will mean lots of work since i have to run this analysis on 6 Different stock indexes. Furthermore, the actual time frame for my analysis is pretty large [1996 -2016]. What i need help with is to run descriptive analysis on the data showing the statistics for each panel,...then combine the datasets and run regression analysis say 'linear regression" on the data but be able to observe the summary output indexed according to the Panels as outlined above.

Any help will really be appreciated,...or any reference manual or book I can read

So you haven't explained what it is you need help with?

Splitting the data into subsets?

Using tidyverse?

You have a number of choices.

For example

require(tidyverse)

# import your data

myData %>%
mutate (year = format( date_field_name, "%Y")

Will add a year column...

You can either then group_by(year) and run your stats using summarise

myData %>%
mutate (year = format( date_field_name, "%Y") %>%
group_by(year) %>%
summarise(mean_share_value = mean(share_price_field_name)

Or you could filter

myData %>%
mutate (year = format( date_field_name, "%Y") %>%
filter(year = 2013) -> myData2013

And run your stats on myData2013

Excel has no role to play...

suppressPackageStartupMessages({
  library(tidyquant)
})

panelA  <- tq_get("^GSPC", get = "stock.prices", from = " 2013-01-01", to = "2013-12-01")
panelA
#> # A tibble: 231 x 8
#>    symbol date        open  high   low close     volume adjusted
#>    <chr>  <date>     <dbl> <dbl> <dbl> <dbl>      <dbl>    <dbl>
#>  1 ^GSPC  2013-01-02 1426. 1462. 1426. 1462. 4202600000    1462.
#>  2 ^GSPC  2013-01-03 1462. 1465. 1456. 1459. 3829730000    1459.
#>  3 ^GSPC  2013-01-04 1459. 1468. 1459. 1466. 3424290000    1466.
#>  4 ^GSPC  2013-01-07 1466. 1466. 1457. 1462. 3304970000    1462.
#>  5 ^GSPC  2013-01-08 1462. 1462. 1452. 1457. 3601600000    1457.
#>  6 ^GSPC  2013-01-09 1457. 1465. 1457. 1461. 3674390000    1461.
#>  7 ^GSPC  2013-01-10 1461. 1472. 1461. 1472. 4081840000    1472.
#>  8 ^GSPC  2013-01-11 1472. 1473. 1468. 1472. 3340650000    1472.
#>  9 ^GSPC  2013-01-14 1472. 1472. 1466. 1471. 3003010000    1471.
#> 10 ^GSPC  2013-01-15 1471. 1473. 1464. 1472. 3135350000    1472.
#> # … with 221 more rows

Thank you Calum_Polwart

I have considered the code you sent,..i should admit my understanding of R is still a little challenged. When i try to run the code, am getting the following error,..

Error: Problem with mutate() input year.
x invalid 'trim' argument
i Input year is format(Data3$Date, "%m/%d/%y").
Run rlang::last_error() to see where the error occurred.

I think its being caused by either me entering the wrong value for "year", or the date_field_name,

if possible please try and plug in the example values, i think this code is just what i need at this juncture

Greetings Technocrat

Your suggestion is very straightforward and simple to implement,..but my challenge is I am running an analysis for a 20 year period,..1996 through to 2016,..so what i intend to do is to download the Dataset as a complete set and then run the analysis but using code that takes into consideration the Panel distributions as outlined earlier.

please advise if there is a possibility

This will get you all the data for a specified stock list for a duration.

suppressPackageStartupMessages({
  library(tidyquant)
  library(tsibble)
  
})

indices <- c("AAPL","MSFT","AMZN","GOOGL","FB","TSM","TSLA","BABA","BRK.A","JPM","V","JNJ","WMT","MA","UNH","DIS","NVDA","BAC","HD","PG")

start_year <- "1996-01-01"
to_year <- "2016-12-31"
series  <- tq_get(indices, get = "stock.prices", from = base_year, to = to_year)

then the tsibble package has tidyverse-like subsetting, can deal with irregular time series such as stock prices, and is the base class for a rich suite of time series and forecasting analytic tools. See the online text.

1 Like

Your date format is wrong. Your date is in iso format so change %d/%m/%y to %Y-%m-%d. (note capital Y as 2016 rather than 16)