Selecting the best values of a data frame

Teun · March 2, 2021, 8:38am

Hi guys!

I am writing my thesis, but I get stuck. Can anyone help me?

My dataframe looks like this:

	MAE	RMSE	MAPE	MASE
Mean
Naive
Snaive
Additive
multiplicative
...

What I want is a new table that looks like the following:

The table shows what is the best forecasting method according to the MAE, RMSE, MAPE and MASE.

	Best method	Lowest value
MAE
RMSE
MAPE
MASE

Does anyone have some advice on how I can achieve this?

lars · March 2, 2021, 8:48am

Hi Teun,

In order to help you quicker and better, can you please provide a reprex? It especially helps if you provide a small sample of your dataset and what the end result should look like.

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

Teun · March 2, 2021, 9:34am

Hi Lars!

I hope this helps?

data<-	read_excel()
liters<-data[,"Liters"]
tsl<-ts(liters, frequency=12, start=c(2014,1))
head(tsl,12)
#>          Jan      Feb      Mar      Apr      May      Jun      Jul      Aug      Sep      Oct      Nov      Dec
#>2014  93.6128 147.5358 238.6348 162.5478 238.3869 407.9096 465.0899 427.7015 487.9975 420.4079 424.8690 485.3066

training<-	subset(tsl,end=length(tsl)-6)
test<-		subset(tsl,start=length(tsl)-5)

meanl<-		meanf(training, h=6)
naivel<-	naive(training, h=6)
snaivel<-	snaive(training, h=6)
#...
#a few more forecasting methods

ntb<-	matrix(c(
	    accuracy(meanl,tsl)["Test set", c("MAE", "RMSE", "MAPE", "MASE")],
	    accuracy(naivel,tsl)["Test set", c("MAE", "RMSE", "MAPE", "MASE")],
	    accuracy(snaivel,tsl)["Test set", c("MAE", "RMSE", "MAPE", "MASE")],
        #...
        #the other forecasting methods
        ncol=4,byrow=TRUE)

colnames(ntb)<- c("MAE", "RMSE", "MAPE", "MASE")
rownames(ntb)<- c("Mean","Naive","Snaive","Drift", "addf", "mulf", "fit_ar", "fit_sar", "txt", "txc", "txtotaal")
ntb1<-		    as.data.frame(ntb)
ntb1
#>                 MAE      RMSE       MAPE      MASE
#>Mean      5088.79487  5385.926   78.46649  6.274855
#>Naive     1519.16667  1797.191   30.20313  1.873243
#>Snaive    3319.16667  3518.989   57.34892  4.092775
#>...
#>Other methods

The next step is to develop the following table.
With the best prediction method in the first column and the corresponding value in the second.

#>       Best method       Lowest value
#>MAE         
#>RMSE        
#>MAPE       
#>MASE

Hopefully someone can help me with this.

Alvast bedankt

nirgrahamuk · March 2, 2021, 9:38am

I'm afraid your latest post didnt help much.
following the advice of the guide, which I will summarise.
please post the result of

dput(ntb1)

as ntb1 is the object you want to manipulate with our help, and which we don't otherwise have access to.

lars · March 2, 2021, 9:57am

I agree with nirgrahamuk, but - both being from the "lowlands" - I took the effort in trying to recreate a dataset and tried with a possible approach.

Is this what you're looking for?

library(tidyverse)

df <- tribble(~forecasting_method, ~MAE, ~RMSE, ~MAPE, ~MASE,
              "Mean", 5088.79487, 5385.926, 78.46649, 6.274855,
              "Naive", 1519.16667, 1797.191, 30.20313, 1.873243, 
              "Snaive", 3319.16667, 3518.989, 57.34892, 4.092775,
             )

df %>% 
  pivot_longer(cols = MAE:MASE,names_to = "col_name", values_to = "value") %>% 
  group_by(col_name) %>% 
  mutate(lowest_value = min(value)) %>% 
  filter(value == lowest_value) %>% 
  select(col_name, forecasting_method, lowest_value)
#> # A tibble: 4 x 3
#> # Groups:   col_name [4]
#>   col_name forecasting_method lowest_value
#>   <chr>    <chr>                     <dbl>
#> 1 MAE      Naive                   1519.  
#> 2 RMSE     Naive                   1797.  
#> 3 MAPE     Naive                     30.2 
#> 4 MASE     Naive                      1.87

^{Created on 2021-03-02 by the reprex package (v1.0.0)}

Graag gedaan,
lars

Teun · March 2, 2021, 11:18am

Okay thanks! I'm getting closer!

I try to automate the program completely, so the first step is not optimal.

I can still do this with the help of your code, but I cannot add the forecast_method

ntb1%>% 
  pivot_longer(cols = MAE:MASE,names_to = "col_name", values_to = "value") %>% 
  group_by(col_name) %>% 
  mutate(lowest_value = min(value)) %>% 
  filter(value == lowest_value) %>% 
  select(col_name, lowest_value)

#   A tibble: 4 x 2
#   Groups:   col_name [4]
#>  col_name lowest_value
#> <chr>           <dbl>
#> 1 RMSE          1797.  
#> 2 MAPE            30.2 
#> 3 MASE             1.87
#> 4 MAE             46.9

The ntb1 table is as follows. sorry for not adding.
First time asking a question here...

dput(ntb1)

structure(list(MAE = c(5088.79487179487, 1519.16666666667, 3319.16666666667, 
1647.37550817214, 1831.48661617148, 1936.12400158197, 46.8793232245741, 
3605.31989627243, 28618.3341970992, 3485.67626446658, 3548.85326904943
), RMSE = c(5385.92567587919, 1797.19137359752, 3518.98917778008, 
2008.47348725738, 2045.7584641563, 2016.76935543134, 3543.89304061995, 
4189.20277632183, 51736.605754416, 4155.50054910879, 4177.41498689721
), MAPE = c(78.4664901470441, 30.2031258777449, 57.3489202049514, 
33.8897040352631, 34.497985767558, 34.1979173250488, 3125.78437656345, 
56.214461999567, 638.54777455328, 54.2022817924072, 50.1330768372487
), MASE = c(6.27485518749612, 1.87324328827628, 4.09277455688669, 
2.03133413973946, 2.25835656257448, 2.38738208967683, 3.85432009039803, 
4.44562235728883, 35.2884931144467, 4.29809303402039, 4.37599488798082
)), class = "data.frame", row.names = c("Mean", "Naive", "Snaive", 
"Drift", "addf", "mulf", "fit_ar", "fit_sar", "txt", "txc", "txtotaal"
))

Bedankt voor de hulp!

lars · March 2, 2021, 11:44am

Hi Teun,

In the dataset I created there's a column forecasting_method where you can act upon. In your dataset it is missing because you have a 'non-tidy' dataset.

See below how you could 'fix' it

library(tidyverse)

df <- structure(list(MAE = c(5088.79487179487, 1519.16666666667, 3319.16666666667, 
                             1647.37550817214, 1831.48661617148, 1936.12400158197, 46.8793232245741, 
                             3605.31989627243, 28618.3341970992, 3485.67626446658, 3548.85326904943
                             ), 
                     RMSE = c(5385.92567587919, 1797.19137359752, 3518.98917778008,
                              2008.47348725738, 2045.7584641563, 2016.76935543134, 3543.89304061995, 
                              4189.20277632183, 51736.605754416, 4155.50054910879, 4177.41498689721
                              ),
                     MAPE = c(78.4664901470441, 30.2031258777449, 57.3489202049514, 
                              33.8897040352631, 34.497985767558, 34.1979173250488, 3125.78437656345, 
                              56.214461999567, 638.54777455328, 54.2022817924072, 50.1330768372487
                              ),
                      MASE = c(6.27485518749612, 1.87324328827628, 4.09277455688669,
                               2.03133413973946, 2.25835656257448, 2.38738208967683, 3.85432009039803,
                               4.44562235728883, 35.2884931144467, 4.29809303402039, 4.37599488798082
                               )
                     ), 
                class = "data.frame", 
                row.names = c("Mean", "Naive", "Snaive", "Drift", "addf", "mulf", "fit_ar", "fit_sar", "txt", "txc", "txtotaal")
                )

## dataset is not tidy as it contains values as a rowname and not as a separate column
## The values of the rownames appears to be missing
df %>% glimpse()
#> Rows: 11
#> Columns: 4
#> $ MAE  <dbl> 5088.79487, 1519.16667, 3319.16667, 1647.37551, 1831.48662, 19...
#> $ RMSE <dbl> 5385.926, 1797.191, 3518.989, 2008.473, 2045.758, 2016.769, 35...
#> $ MAPE <dbl> 78.46649, 30.20313, 57.34892, 33.88970, 34.49799, 34.19792, 31...
#> $ MASE <dbl> 6.274855, 1.873243, 4.092775, 2.031334, 2.258357, 2.387382, 3....

## let's convert the rownames to a new column and specify the new column name
df_tidy <- df %>% 
  rownames_to_column(var = "forecasting_method")
df_tidy %>% glimpse
#> Rows: 11
#> Columns: 5
#> $ forecasting_method <chr> "Mean", "Naive", "Snaive", "Drift", "addf", "mul...
#> $ MAE                <dbl> 5088.79487, 1519.16667, 3319.16667, 1647.37551, ...
#> $ RMSE               <dbl> 5385.926, 1797.191, 3518.989, 2008.473, 2045.758...
#> $ MAPE               <dbl> 78.46649, 30.20313, 57.34892, 33.88970, 34.49799...
#> $ MASE               <dbl> 6.274855, 1.873243, 4.092775, 2.031334, 2.258357...

## now the dataset can be considered as tidied and will be much easier to work with
df_tidy %>% 
  pivot_longer(cols = MAE:MASE,names_to = "col_name", values_to = "value") %>% 
  group_by(col_name) %>% 
  mutate(lowest_value = min(value)) %>% 
  filter(value == lowest_value) %>% 
  select(col_name, forecasting_method, lowest_value) 
#> # A tibble: 4 x 3
#> # Groups:   col_name [4]
#>   col_name forecasting_method lowest_value
#>   <chr>    <chr>                     <dbl>
#> 1 RMSE     Naive                   1797.  
#> 2 MAPE     Naive                     30.2 
#> 3 MASE     Naive                      1.87
#> 4 MAE      fit_ar                    46.9

^{Created on 2021-03-02 by the reprex package (v1.0.0)}

I can highly recommend the following book if you want to get a better understanding of the tidyverse approach.

Teun · March 2, 2021, 1:32pm

Hi lars!

Ik doe het even in het nederlands hoor, wel zo makkelijk.

Het probleem is inmiddels opgelost.
Heb het uiteindelijk zo opgelost:

Method<-c("Mean", "Naive", "Snaive", "Drift", "addf", "mulf", "fit_ar", "fit_sar", "txt", "txc", "txtotaal")
tib<-	as_tibble(cbind(Method,ntb1))

best<-	tib%>%
	pivot_longer(cols = MAE:MASE,names_to = "col_name", values_to = "value") %>% 
  	group_by(col_name) %>% 
  	mutate(lowest_value = min(value)) %>% 
  	filter(value == lowest_value) %>% 
  	select(col_name, Method, lowest_value) 
colnames(best)<- c("KPI", "Best method", "Value")

best<-	as.data.frame(best)
best
#>   KPI Best method       Value
#>1 RMSE       Naive 1797.191374
#>2 MAPE       Naive   30.203126
#>3 MASE       Naive    1.873243
#>4  MAE      fit_ar   46.879323

Zonder je hulp was het zeker niet gelukt.

Thanks!

Teun

system · March 23, 2021, 1:32pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.