Data transformation

beniam · November 22, 2020, 12:50am

Hi Community,

I have a data set where the first column is a list of days of the month, and then the other columns are values for the year. It looks like this

Day 2019A 2019B 2018A 2018B
24-jun 10 12 14 16
25-jun 9 8 7 6

I am wondering how I can transform the data so that I get

Date A B
24-june-2019 10 12
25-june-2019 9 8
24-june-2018 14 16
24-june-2019 7 6

Appreciate any guidance on this as always.

Thanks,
Ben

andresrcs · November 22, 2020, 12:56am

Can you please share a small part of the data set in a copy-paste friendly format?

In case you don't know how to do it, there are many options, which include:

If you have stored the data set in some R object, dput function is very handy.
In case the data set is in a spreadsheet, check out the datapasta package. Take a look at this link.

beniam · November 22, 2020, 1:00am

Thank you, I am new to the community and realised that did not paste through well, so have put the samples into code as below.

I am not sure how to share the data. I have a CSV file and it is public data, but I have done a few steps already to subset it for my needs.

From this

sample1 <- matrix(c(51,43,22,92,28,21,68,22),ncol=4,byrow=TRUE)
colnames(sample1) <- c("2019A","2019B","2018A", "2018B")
rownames(sample1) <- c("24-jun","25-jun")
sample1 <- as.table(sample1)
sample1

       2019A 2019B 2018A 2018B
24-jun    51    43    22    92
25-jun    28    21    68    22

To this

sample2 <- matrix(c(51,43,28,21,22,92,68,22),ncol=2,byrow=TRUE)
colnames(sample2) <- c("A","B")
rownames(sample2) <- c("24-june-2019","25-june-2019","24-june-2018","25-june-2018")
sample2 <- as.table(sample2)
sample2

              A  B
24-june-2019 51 43
25-june-2019 28 21
24-june-2018 22 92
25-june-2018 68 22

Thanks for the reference to DataPasta that is great, I have struggled with this in the past sharing data. This is a sample from the data.

tibble::tribble(
      ~Day, ~`2019.Natural.Snow.Depth`, ~`2019.Average.depth.in.snow-making.area.(61.ha)`, ~`2018.Natural.Snow.Depth`, ~`2018.Average.depth.in.snow-making.area.(61.ha)`,
  "24-Jun",                        10L,                                               45L,                        50L,                                               57L,
  "25-Jun",                        10L,                                               45L,                        48L,                                               57L,
  "26-Jun",                         9L,                                               46L,                        48L,                                               57L,
  "27-Jun",                         8L,                                               46L,                        48L,                                               57L
  )

andresrcs · November 22, 2020, 1:19am

Here is an example of how to do this sort of data wranglin

library(tidyverse)

sample1 <- data.frame(
  check.names = FALSE,
    row.names = c("24-jun", "25-jun"),
      `2019A` = c(51, 28),
      `2019B` = c(43, 21),
      `2018A` = c(22, 68),
      `2018B` = c(92, 22)
)

sample1 %>%
    rownames_to_column(var = "date") %>% 
    pivot_longer(cols = -date,
                 names_to = c("year", "var"),
                 names_pattern = "(\\d{4})(.)") %>% 
    unite(col = date, date, year, sep = "-") %>% 
    pivot_wider(id_cols = date, names_from = var, values_from = value)
#> # A tibble: 4 x 3
#>   date            A     B
#>   <chr>       <dbl> <dbl>
#> 1 24-jun-2019    51    43
#> 2 24-jun-2018    22    92
#> 3 25-jun-2019    28    21
#> 4 25-jun-2018    68    22

^{Created on 2020-11-22 by the reprex package (v0.3.0.9001)}

And with the other sample data

library(tidyverse)

sample <- tibble::tribble(
    ~Day, ~`2019.Natural.Snow.Depth`, ~`2019.Average.depth.in.snow-making.area.(61.ha)`, ~`2018.Natural.Snow.Depth`, ~`2018.Average.depth.in.snow-making.area.(61.ha)`,
    "24-Jun",                        10L,                                               45L,                        50L,                                               57L,
    "25-Jun",                        10L,                                               45L,                        48L,                                               57L,
    "26-Jun",                         9L,                                               46L,                        48L,                                               57L,
    "27-Jun",                         8L,                                               46L,                        48L,                                               57L
)

sample %>%
    pivot_longer(cols = -Day,
                 names_to = c("year", "var"),
                 names_pattern = "(^\\d{4})(.+)") %>%
    unite(col = date, Day, year, sep = "-") %>% 
    pivot_wider(id_cols = date, names_from = var, values_from = value)
#> # A tibble: 8 x 3
#>   date        .Natural.Snow.Depth `.Average.depth.in.snow-making.area.(61.ha)`
#>   <chr>                     <int>                                        <int>
#> 1 24-Jun-2019                  10                                           45
#> 2 24-Jun-2018                  50                                           57
#> 3 25-Jun-2019                  10                                           45
#> 4 25-Jun-2018                  48                                           57
#> 5 26-Jun-2019                   9                                           46
#> 6 26-Jun-2018                  48                                           57
#> 7 27-Jun-2019                   8                                           46
#> 8 27-Jun-2018                  48                                           57

^{Created on 2020-11-22 by the reprex package (v0.3.0.9001)}

beniam · November 22, 2020, 1:54am

andresrcs:

sample %>%
    pivot_longer(cols = -Day,
                 names_to = c("year", "var"),
                 names_pattern = "(^\\d{4})(.+)") %>%
    unite(col = date, Day, year, sep = "-") %>% 
    pivot_wider(id_cols = date, names_from = var, values_from = value)

Thanks you for this, bit for me to get my head around, really appreciate the quick response

beniam · November 29, 2020, 12:09am

II have been trying to take this another step further, so create a 'depth' and 'type' column so that I can use the 'type' as a colour variable when visualising.

I am not having much luck with this, and appreciate any guidance. The next unit in my course is data wrangling, which would would be really helpful to have a better understanding of this now.

This is what I would like the data to look like.

sample2 <- matrix(c("24-june-2019",51,"Made","24-june-2019",43,"Natural","25-june-2019",28,"Made","25-june-2019",21,"Natural","24-june-2018",22,"Made","24-june-2018",92,"Natural","25-june-2018",68,"Made","25-june-2018",22,"Natural"),ncol=3,byrow=TRUE)
colnames(sample2) <- c("Date", "Depth", "type")
sample2 <- as.data.frame(sample2)
sample2

Date          Depth Type
24-june-2019	51	Made		
24-june-2019	43	Natural		
25-june-2019	28	Made		
25-june-2019	21	Natural		
24-june-2018	22	Made		
24-june-2018	92	Natural		
25-june-2018	68	Made		
25-june-2018	22	Natural		
8 rows

beniam · November 29, 2020, 1:32am

I have managed to figure this out, found a great link for the data wrangling here:
https://uc-r.github.io/tidyr

I ended up going back to an earlier transformation of the data, and then using, gather, separate and unite managed to get the data into the shape I wanted.

df
df1 <- df %>% gather(Year, Depth, "2019Nat":"2010Made") #can also use col numbers ie 4:20

df2  <- df1 %>% separate(Year, c("Year", "Type"), sep = 4) #other value for sep "."

df3  <- df2 %>% unite(Date, Day, Year, sep = "-")

df3$Date <- as.Date(df3$Date, format="%d-%b-%Y")
df3[order(as.Date(df3$Date, format="%d/%m/%Y")),]
df3

system · December 20, 2020, 1:32am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.