Loading in and transposing excel table to data frame

Dear R users,

I have the following excel table (it spans about 2 years worth of data):

I want to take the England level data (last row) for columns ‘total road accidents’, total accidents attended’, ‘emergency accidents’ and ‘emergency accidents attended’ and transpose it into a data frame in R, with 5 columns (date, total road accidents, total accidents attended, emergency accidents, and emergency accidents attended).

I have managed this (see solution below) but in what feels like an extremely clunky way. I really want to improve my R skills in terms of loading in and transposing data as I am working with a lot of excel data that is not properly formatted for R.

Please note I also posted this on stackoverflow. I am aware of your cross posting policy and so I have reformulated here. However, as some images may be helpful, I will leave the link to the stackoverflow post. I tried to paste all the images in here but I believe users are only allowed to embed 1 image

Transposing excel table into data frame in R - Stack Overflow

Here is my solution at present:

Start by generating list of dates for date column (this proved to be simpler than reading the dates from the excel file)

start_date <- as.Date("2021-10-25")
end_date <- as.Date("2021-11-31") #in my code this includes dates up to 2023 but the reprex below only includes first two weeks
date_range <- as.character(seq(start_date, end_date, "weeks"))

Read in the data (an excel file in my code, substituted here with a reprex including the first two weeks of data)

accidents <- structure(list(`Week Commencing` = c("City", "Birmingham", "Bradford", 
                                     "Brighton and Hove", "Bristol", "Cambridge", "Canterbury", "Carlisle", 
                                     "Chelmsford", "Chester", "Chichester", "Colchester", "Coventry", 
                                     "Derby", "Doncaster", "Durham", "ENGLAND"), `44494...2` = c("Total road accidents", 
                                                                                                 "1058", "369", "471", "1025", "717", "1495", "3350", "1006", 
                                                                                                 "1696", "2194", "73", "66", "1221", "128", "629", "15498"), `44494...3` = c("Total accidents Attended", 
                                                                                                                                                                             "856", "310", "391", "818", "599", "1193", "2530", "834", "1347", 
                                                                                                                                                                             "1703", "54", "52", "996", "106", "223", "12012"), `44494...4` = c("Total accidents Conveyed", 
                                                                                                                                                                                                                                                "616", "234", "298", "577", "458", "798", "1965", "609", "1035", 
                                                                                                                                                                                                                                                "1198", "49", "44", "660", "81", "140", "8762"), `44494...5` = c("Emergency accidents", 
                                                                                                                                                                                                                                                                                                                 "880", "270", "376", "936", "626", "1243", "2815", "815", "1496", 
                                                                                                                                                                                                                                                                                                                 "1892", "33", "34", "989", "63", "615", "13083"), `44494...6` = c("Emergency accidents attended", 
                                                                                                                                                                                                                                                                                                                                                                                   "671", "214", "327", "692", "489", "948", "2070", "661", "1071", 
                                                                                                                                                                                                                                                                                                                                                                                   "1364", "26", "28", "758", "54", "203", "9576"), `44494...7` = c("Emergency accidents conveyed", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                    "484", "148", "245", "442", "357", "591", "1532", "452", "765", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                    "946", "19", "25", "422", "40", "130", "6598"), `44501...8` = c("Total road accidents", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    "974", "339", "461", "1027", "709", "1323", "3214", "1135", "1636", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    "2146", "62", "68", "1092", "119", "614", "14919"), `44501...9` = c("Total accidents Attended", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        "780", "289", "379", "812", "581", "1063", "2516", "900", "1300", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        "1665", "44", "49", "895", "94", "213", "11580"), `44501...10` = c("Total accidents Conveyed", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           "578", "218", "305", "533", "445", "730", "1901", "652", "1008", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           "1180", "38", "44", "573", "73", "141", "8419"), `44501...11` = c("Emergency accidents", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             "974", "297", "427", "1047", "758", "1481", "2848", "818", "1504", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             "1929", "30", "37", "1012", "62", "557", "13781"), `44501...12` = c("Emergency accidents attended", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 "713", "250", "351", "752", "560", "1086", "2159", "697", "1151", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 "1418", "24", "28", "793", "46", "164", "10192"), `44501...13` = c("Emergency accidents conveyed", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    "494", "164", "263", "476", "392", "664", "1559", "470", "867", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    "944", "18", "24", "442", "35", "109", "6921")), class = c("tbl_df", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               "tbl", "data.frame"), row.names = c(NA, -17L))

select England level data and remove first column

england_accidents <- accidents[17, -1]

Extract the columns desired for analysis:

total_accidents <- (england_accidents[, c(1, seq(7, ncol(england_accidents), 
                     by = 6))]) 
total_accidents_attended <- (england_accidents[, 
                            seq(2, ncol(england_accidents), 6)]) 
total_emergencies <- (england_accidents[, 
                            seq(4, ncol(england_accidents), 6)])
total_emergencies_attended <- (england_accidents[, 
                            seq(5, ncol(england_accidents), 6)])

Transpose the vectors from row to column format:

total_accidents <- t(total_accidents) 
total_accidents_attended <- t(total_accidents_attended) 
total_emergencies <- t(total_emergencies) 
total_emergencies_attended <- t(total_emergencies_attended)

make df

accidents_df <- as.data.frame(cbind(date_range, total_accidents, 
                      total_accidents_attended, total_emergencies, 
                      total_emergencies_attended))

final changes to fix column/row names

accidents_columns <- c("total_accidents", 
                       "total_accidents_attended" ,"total_emergencies",
                       "total_emergencies_attended") 
names(accidents_df)[2:5] <-accidents_columns 
rownames(accidents_df) <- 1:nrow(accidents_df)

Please familiarize yourself with our cross-posting policy. In short, it is not considered OK to just drop a link to another help site, you need to properly present the question here as well.

Hi Andresrcs,

Thank you for letting me know about this. I will edit the question.

Best wishes,

Gerard

You have not provided representative data that would allow your code to run, (to demonstrate it works) , or which the forum users might try to run their own attempts to improve on your code.
Please review the best practice for providing representative data to the forum as per this document

link

The short of it will be that where in your code you read an excel file (that we dont have) :

accidents <- read_xlsx("input\\accidents data.xlsx", sheet = "accidents", 
                        range = cell_limits(c(3, 1), c(NA, NA)))

this is the point at which you would provide "data as code", by using dput() on accidents, so that you can substitute this for the read_xlsx call.

Thankyou, I will do this!

Hi nirgrahamuk,

I have used the repex as you suggested (hopefully correctly) - thanks for making me aware of this.

I don't suppose you have any advice on how to carry out this transposition more efficiently?

Thanks for your initial suggestion anyway.

Gerard

library(tidyverse)

accidents <- structure(list(`Week Commencing` = c(
  "City", "Birmingham", "Bradford",
  "Brighton and Hove", "Bristol", "Cambridge", "Canterbury", "Carlisle",
  "Chelmsford", "Chester", "Chichester", "Colchester", "Coventry",
  "Derby", "Doncaster", "Durham", "ENGLAND"
), `44494...2` = c(
  "Total road accidents",
  "1058", "369", "471", "1025", "717", "1495", "3350", "1006",
  "1696", "2194", "73", "66", "1221", "128", "629", "15498"
), `44494...3` = c(
  "Total accidents Attended",
  "856", "310", "391", "818", "599", "1193", "2530", "834", "1347",
  "1703", "54", "52", "996", "106", "223", "12012"
), `44494...4` = c(
  "Total accidents Conveyed",
  "616", "234", "298", "577", "458", "798", "1965", "609", "1035",
  "1198", "49", "44", "660", "81", "140", "8762"
), `44494...5` = c(
  "Emergency accidents",
  "880", "270", "376", "936", "626", "1243", "2815", "815", "1496",
  "1892", "33", "34", "989", "63", "615", "13083"
), `44494...6` = c(
  "Emergency accidents attended",
  "671", "214", "327", "692", "489", "948", "2070", "661", "1071",
  "1364", "26", "28", "758", "54", "203", "9576"
), `44494...7` = c(
  "Emergency accidents conveyed",
  "484", "148", "245", "442", "357", "591", "1532", "452", "765",
  "946", "19", "25", "422", "40", "130", "6598"
), `44501...8` = c(
  "Total road accidents",
  "974", "339", "461", "1027", "709", "1323", "3214", "1135", "1636",
  "2146", "62", "68", "1092", "119", "614", "14919"
), `44501...9` = c(
  "Total accidents Attended",
  "780", "289", "379", "812", "581", "1063", "2516", "900", "1300",
  "1665", "44", "49", "895", "94", "213", "11580"
), `44501...10` = c(
  "Total accidents Conveyed",
  "578", "218", "305", "533", "445", "730", "1901", "652", "1008",
  "1180", "38", "44", "573", "73", "141", "8419"
), `44501...11` = c(
  "Emergency accidents",
  "974", "297", "427", "1047", "758", "1481", "2848", "818", "1504",
  "1929", "30", "37", "1012", "62", "557", "13781"
), `44501...12` = c(
  "Emergency accidents attended",
  "713", "250", "351", "752", "560", "1086", "2159", "697", "1151",
  "1418", "24", "28", "793", "46", "164", "10192"
), `44501...13` = c(
  "Emergency accidents conveyed",
  "494", "164", "263", "476", "392", "664", "1559", "470", "867",
  "944", "18", "24", "442", "35", "109", "6921"
)), class = c(
  "tbl_df",
  "tbl", "data.frame"
), row.names = c(NA, -17L))


(s_acc <- accidents |> rename(wc = `Week Commencing`) |>
  filter(wc %in% c("City", "ENGLAND")) |>
  pivot_longer(cols = -wc) |> split(~wc))

(res2 <- inner_join(s_acc[[1]],
  s_acc[[2]],
  by = "name"
) |> select(-wc.x, -wc.y) |>
  mutate(name = as.Date(as.integer(
    sub(".*?([0-9]+).*", "\\1", name, perl = TRUE)),
    origin = "1900-01-01" # default origin for excel dates
  )) |> pivot_wider(
    names_from = "value.x",
    values_from = "value.y"
  ))
1 Like

Wow! Thank you very much.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.