Cleaning dataframe

Hi, I am trying to clean and sort my data.
My data is monthly price data for several firms, where month(the period) is written 1996m7. It is not chronological, and I will need this, ie. jan 2010, feb 2010........ How can I sort this?

It is a big dataframe, with several companies:
Some of the companies don´t have data for all of the year/months, so I have removed the empty values (so each company might have different start and end time)

see:

Company 1 ; 1996m1 ; price
Company 1 ; 1996m11 ; price
Company 1 ;1996m12 ; price
Company 1; 1996m2 ; price
Company 1; 1996m3 ; price
.
.
Company1;2000m9 ; price

Company2; 1996m1 ; price
.
Company N

To sort the data properly, I would split the Year-Month column into two and make each numeric.

DF <- data.frame(YearMonth = c("1996m1", "1996m11", "1996m12", "1996m2"),
                 Price = 1:4)
DF
#>   YearMonth Price
#> 1    1996m1     1
#> 2   1996m11     2
#> 3   1996m12     3
#> 4    1996m2     4
library(tidyr)
library(dplyr)
DF <- DF |> separate(col = YearMonth, into = c("Year", "Month"), sep = "m")
#Year and month columns are characters and will sort alphabetically!
str(DF)
#> 'data.frame':    4 obs. of  3 variables:
#>  $ Year : chr  "1996" "1996" "1996" "1996"
#>  $ Month: chr  "1" "11" "12" "2"
#>  $ Price: int  1 2 3 4
DF <- DF |> mutate(across(Year:Month, as.numeric))
#Now Year and Month are numeric
str(DF)
#> 'data.frame':    4 obs. of  3 variables:
#>  $ Year : num  1996 1996 1996 1996
#>  $ Month: num  1 11 12 2
#>  $ Price: int  1 2 3 4

Created on 2022-02-28 by the reprex package (v2.0.1)

1 Like

This is perfect, thanks!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.