Data transformation

Hi Community,

I have a data set where the first column is a list of days of the month, and then the other columns are values for the year. It looks like this

Day 2019A 2019B 2018A 2018B
24-jun 10 12 14 16
25-jun 9 8 7 6

I am wondering how I can transform the data so that I get

Date A B
24-june-2019 10 12
25-june-2019 9 8
24-june-2018 14 16
24-june-2019 7 6

Appreciate any guidance on this as always.

Thanks,
Ben

Can you please share a small part of the data set in a copy-paste friendly format?

In case you don't know how to do it, there are many options, which include:

  1. If you have stored the data set in some R object, dput function is very handy.

  2. In case the data set is in a spreadsheet, check out the datapasta package. Take a look at this link.

Thank you, I am new to the community and realised that did not paste through well, so have put the samples into code as below.

I am not sure how to share the data. I have a CSV file and it is public data, but I have done a few steps already to subset it for my needs.

From this

sample1 <- matrix(c(51,43,22,92,28,21,68,22),ncol=4,byrow=TRUE)
colnames(sample1) <- c("2019A","2019B","2018A", "2018B")
rownames(sample1) <- c("24-jun","25-jun")
sample1 <- as.table(sample1)
sample1

       2019A 2019B 2018A 2018B
24-jun    51    43    22    92
25-jun    28    21    68    22

To this

sample2 <- matrix(c(51,43,28,21,22,92,68,22),ncol=2,byrow=TRUE)
colnames(sample2) <- c("A","B")
rownames(sample2) <- c("24-june-2019","25-june-2019","24-june-2018","25-june-2018")
sample2 <- as.table(sample2)
sample2

              A  B
24-june-2019 51 43
25-june-2019 28 21
24-june-2018 22 92
25-june-2018 68 22

Thanks for the reference to DataPasta that is great, I have struggled with this in the past sharing data. This is a sample from the data.

tibble::tribble(
      ~Day, ~`2019.Natural.Snow.Depth`, ~`2019.Average.depth.in.snow-making.area.(61.ha)`, ~`2018.Natural.Snow.Depth`, ~`2018.Average.depth.in.snow-making.area.(61.ha)`,
  "24-Jun",                        10L,                                               45L,                        50L,                                               57L,
  "25-Jun",                        10L,                                               45L,                        48L,                                               57L,
  "26-Jun",                         9L,                                               46L,                        48L,                                               57L,
  "27-Jun",                         8L,                                               46L,                        48L,                                               57L
  )

Here is an example of how to do this sort of data wranglin

library(tidyverse)

sample1 <- data.frame(
  check.names = FALSE,
    row.names = c("24-jun", "25-jun"),
      `2019A` = c(51, 28),
      `2019B` = c(43, 21),
      `2018A` = c(22, 68),
      `2018B` = c(92, 22)
)

sample1 %>%
    rownames_to_column(var = "date") %>% 
    pivot_longer(cols = -date,
                 names_to = c("year", "var"),
                 names_pattern = "(\\d{4})(.)") %>% 
    unite(col = date, date, year, sep = "-") %>% 
    pivot_wider(id_cols = date, names_from = var, values_from = value)
#> # A tibble: 4 x 3
#>   date            A     B
#>   <chr>       <dbl> <dbl>
#> 1 24-jun-2019    51    43
#> 2 24-jun-2018    22    92
#> 3 25-jun-2019    28    21
#> 4 25-jun-2018    68    22

Created on 2020-11-22 by the reprex package (v0.3.0.9001)

And with the other sample data

library(tidyverse)

sample <- tibble::tribble(
    ~Day, ~`2019.Natural.Snow.Depth`, ~`2019.Average.depth.in.snow-making.area.(61.ha)`, ~`2018.Natural.Snow.Depth`, ~`2018.Average.depth.in.snow-making.area.(61.ha)`,
    "24-Jun",                        10L,                                               45L,                        50L,                                               57L,
    "25-Jun",                        10L,                                               45L,                        48L,                                               57L,
    "26-Jun",                         9L,                                               46L,                        48L,                                               57L,
    "27-Jun",                         8L,                                               46L,                        48L,                                               57L
)

sample %>%
    pivot_longer(cols = -Day,
                 names_to = c("year", "var"),
                 names_pattern = "(^\\d{4})(.+)") %>%
    unite(col = date, Day, year, sep = "-") %>% 
    pivot_wider(id_cols = date, names_from = var, values_from = value)
#> # A tibble: 8 x 3
#>   date        .Natural.Snow.Depth `.Average.depth.in.snow-making.area.(61.ha)`
#>   <chr>                     <int>                                        <int>
#> 1 24-Jun-2019                  10                                           45
#> 2 24-Jun-2018                  50                                           57
#> 3 25-Jun-2019                  10                                           45
#> 4 25-Jun-2018                  48                                           57
#> 5 26-Jun-2019                   9                                           46
#> 6 26-Jun-2018                  48                                           57
#> 7 27-Jun-2019                   8                                           46
#> 8 27-Jun-2018                  48                                           57

Created on 2020-11-22 by the reprex package (v0.3.0.9001)

1 Like

Thanks you for this, bit for me to get my head around, really appreciate the quick response