restructuring a dataframe - collapsing large number of columns

campbellg · April 25, 2023, 2:39pm

Dear R experts,

I have read the below excel file into R as a dataframe:

It currently contains many columns of data (but these columns are just a repetition of the same 6 columns: total road accidents, total road accidents attended, total accidents conveyed, emergency accidents, emergency accidents attended, emergency accidents conveyed).

I want to re-structure this dataframe to have only 7 columns (1 column for the city, and another 6 which groups all the data together under the relevant columns i.e.,1 column for all the 'total road accidents', 1 column for all the 'total accidents attended' and so on.

When collapsing these columns its important that the order of the data remains the same (i.e., the second column of 'total road accidents' should be placed directly under the first column of this heading, and the third underneath the second (and so on).

I have searched online but cannot find a solution to this. I have pasted a repex of the data frame below (the actual df has a lot more columns than this).

accidents <- structure(list(City = c("Birmingham", "Bradford", "Brighton and Hove", 
                        "Bristol", "Cambridge", "Canterbury", "Carlisle", "Chelmsford", 
                        "Chester", "Chichester", "Colchester", "Coventry", "Derby", "Doncaster", 
                        "Durham", "ENGLAND"), `Total road accidents...2` = c(1058, 369, 
                                                                             471, 1025, 717, 1495, 3350, 1006, 1696, 2194, 73, 66, 1221, 128, 
                                                                             629, 15498), `Total accidents Attended...3` = c(856, 310, 391, 
                                                                                                                             818, 599, 1193, 2530, 834, 1347, 1703, 54, 52, 996, 106, 223, 
                                                                                                                             12012), `Total accidents Conveyed...4` = c(616, 234, 298, 577, 
                                                                                                                                                                        458, 798, 1965, 609, 1035, 1198, 49, 44, 660, 81, 140, 8762), 
               `Emergency accidents...5` = c(880, 270, 376, 936, 626, 1243, 
                                             2815, 815, 1496, 1892, 33, 34, 989, 63, 615, 13083), `Emergency accidents attended...6` = c(671, 
                                                                                                                                         214, 327, 692, 489, 948, 2070, 661, 1071, 1364, 26, 28, 758, 
                                                                                                                                         54, 203, 9576), `Emergency accidents conveyed...7` = c(484, 
                                                                                                                                                                                                148, 245, 442, 357, 591, 1532, 452, 765, 946, 19, 25, 422, 
                                                                                                                                                                                                40, 130, 6598), `Total road accidents...8` = c(974, 339, 
                                                                                                                                                                                                                                               461, 1027, 709, 1323, 3214, 1135, 1636, 2146, 62, 68, 1092, 
                                                                                                                                                                                                                                               119, 614, 14919), `Total accidents Attended...9` = c(780, 
                                                                                                                                                                                                                                                                                                    289, 379, 812, 581, 1063, 2516, 900, 1300, 1665, 44, 49, 
                                                                                                                                                                                                                                                                                                    895, 94, 213, 11580), `Total accidents Conveyed...10` = c(578, 
                                                                                                                                                                                                                                                                                                                                                              218, 305, 533, 445, 730, 1901, 652, 1008, 1180, 38, 44, 573, 
                                                                                                                                                                                                                                                                                                                                                              73, 141, 8419), `Emergency accidents...11` = c(974, 297, 
                                                                                                                                                                                                                                                                                                                                                                                                             427, 1047, 758, 1481, 2848, 818, 1504, 1929, 30, 37, 1012, 
                                                                                                                                                                                                                                                                                                                                                                                                             62, 557, 13781), `Emergency accidents attended...12` = c(713, 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                      250, 351, 752, 560, 1086, 2159, 697, 1151, 1418, 24, 28, 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                      793, 46, 164, 10192), `Emergency accidents conveyed...13` = c(494, 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    164, 263, 476, 392, 664, 1559, 470, 867, 944, 18, 24, 442, 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    35, 109, 6921), `Total road accidents...14` = c(1072, 363, 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    402, 1111, 765, 1551, 3236, 1080, 1713, 2298, 57, 43, 1129, 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    122, 622, 15564), `Total accidents Attended...15` = c(864, 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          290, 341, 871, 625, 1214, 2478, 872, 1338, 1754, 45, 33, 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          925, 105, 217, 11972), `Total accidents Conveyed...16` = c(604, 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     221, 272, 593, 461, 774, 1897, 627, 1035, 1182, 41, 26, 582, 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     90, 155, 8560), `Emergency accidents...17` = c(921, 288, 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    324, 967, 671, 1345, 2840, 805, 1452, 2005, 33, 25, 928, 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    69, 599, 13272), `Emergency accidents attended...18` = c(743, 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             235, 269, 742, 542, 1053, 2147, 664, 1116, 1508, 26, 21, 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             749, 64, 204, 10083), `Emergency accidents conveyed...19` = c(493, 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           172, 203, 474, 381, 625, 1597, 436, 823, 961, 22, 14, 421, 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           49, 145, 6816)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    -16L))
```r

Any help is extremely appreciated here!

technocrat · April 26, 2023, 7:22am

By hand. If there are a large number, might need to adapt

d <- data.frame(
  City = c(
    "Birmingham", "Bradford", "Brighton and Hove",
    "Bristol", "Cambridge", "Canterbury", "Carlisle", "Chelmsford",
    "Chester", "Chichester", "Colchester", "Coventry", "Derby", "Doncaster",
    "Durham", "ENGLAND"
  ), `Total road accidents...2` = c(
    1058, 369,
    471, 1025, 717, 1495, 3350, 1006, 1696, 2194, 73, 66, 1221, 128,
    629, 15498
  ), `Total accidents Attended...3` = c(
    856, 310, 391,
    818, 599, 1193, 2530, 834, 1347, 1703, 54, 52, 996, 106, 223,
    12012
  ), `Total accidents Conveyed...4` = c(
    616, 234, 298, 577,
    458, 798, 1965, 609, 1035, 1198, 49, 44, 660, 81, 140, 8762
  ),
  `Emergency accidents...5` = c(
    880, 270, 376, 936, 626, 1243,
    2815, 815, 1496, 1892, 33, 34, 989, 63, 615, 13083
  ), `Emergency accidents attended...6` = c(
    671,
    214, 327, 692, 489, 948, 2070, 661, 1071, 1364, 26, 28, 758,
    54, 203, 9576
  ), `Emergency accidents conveyed...7` = c(
    484,
    148, 245, 442, 357, 591, 1532, 452, 765, 946, 19, 25, 422,
    40, 130, 6598
  ), `Total road accidents...8` = c(
    974, 339,
    461, 1027, 709, 1323, 3214, 1135, 1636, 2146, 62, 68, 1092,
    119, 614, 14919
  ), `Total accidents Attended...9` = c(
    780,
    289, 379, 812, 581, 1063, 2516, 900, 1300, 1665, 44, 49,
    895, 94, 213, 11580
  ), `Total accidents Conveyed...10` = c(
    578,
    218, 305, 533, 445, 730, 1901, 652, 1008, 1180, 38, 44, 573,
    73, 141, 8419
  ), `Emergency accidents...11` = c(
    974, 297,
    427, 1047, 758, 1481, 2848, 818, 1504, 1929, 30, 37, 1012,
    62, 557, 13781
  ), `Emergency accidents attended...12` = c(
    713,
    250, 351, 752, 560, 1086, 2159, 697, 1151, 1418, 24, 28,
    793, 46, 164, 10192
  ), `Emergency accidents conveyed...13` = c(
    494,
    164, 263, 476, 392, 664, 1559, 470, 867, 944, 18, 24, 442,
    35, 109, 6921
  ), `Total road accidents...14` = c(
    1072, 363,
    402, 1111, 765, 1551, 3236, 1080, 1713, 2298, 57, 43, 1129,
    122, 622, 15564
  ), `Total accidents Attended...15` = c(
    864,
    290, 341, 871, 625, 1214, 2478, 872, 1338, 1754, 45, 33,
    925, 105, 217, 11972
  ), `Total accidents Conveyed...16` = c(
    604,
    221, 272, 593, 461, 774, 1897, 627, 1035, 1182, 41, 26, 582,
    90, 155, 8560
  ), `Emergency accidents...17` = c(
    921, 288,
    324, 967, 671, 1345, 2840, 805, 1452, 2005, 33, 25, 928,
    69, 599, 13272
  ), `Emergency accidents attended...18` = c(
    743,
    235, 269, 742, 542, 1053, 2147, 664, 1116, 1508, 26, 21,
    749, 64, 204, 10083
  ), `Emergency accidents conveyed...19` = c(
    493,
    172, 203, 474, 381, 625, 1597, 436, 823, 961, 22, 14, 421,
    49, 145, 6816
  )
)

# 16 observations of 19 variables
dim(d)
#> [1] 16 19

# the names of the variables
colnames(d)
#>  [1] "City"                              "Total.road.accidents...2"         
#>  [3] "Total.accidents.Attended...3"      "Total.accidents.Conveyed...4"     
#>  [5] "Emergency.accidents...5"           "Emergency.accidents.attended...6" 
#>  [7] "Emergency.accidents.conveyed...7"  "Total.road.accidents...8"         
#>  [9] "Total.accidents.Attended...9"      "Total.accidents.Conveyed...10"    
#> [11] "Emergency.accidents...11"          "Emergency.accidents.attended...12"
#> [13] "Emergency.accidents.conveyed...13" "Total.road.accidents...14"        
#> [15] "Total.accidents.Attended...15"     "Total.accidents.Conveyed...16"    
#> [17] "Emergency.accidents...17"          "Emergency.accidents.attended...18"
#> [19] "Emergency.accidents.conveyed...19"

# save out cities
cities <- d[,1]

# drop cities
d <- d[,-1]    

rolled <- data.frame(
  a = rowSums(d[,c(1,7,13)]),
  b = rowSums(d[,c(2,8,14)]),
  c = rowSums(d[,c(3,9,15)]),
  d = rowSums(d[,c(4,10,16)]),
  e = rowSums(d[,c(5,11,17)]),
  f = rowSums(d[,c(6,12,18)]))
colnames(rolled) <- gsub("....$","",colnames(d)[1:6])

(new <- cbind(cities,rolled))
#>               cities Total.road.accidents Total.accidents.Attended
#> 1         Birmingham                 3104                     2500
#> 2           Bradford                 1071                      889
#> 3  Brighton and Hove                 1334                     1111
#> 4            Bristol                 3163                     2501
#> 5          Cambridge                 2191                     1805
#> 6         Canterbury                 4369                     3470
#> 7           Carlisle                 9800                     7524
#> 8         Chelmsford                 3221                     2606
#> 9            Chester                 5045                     3985
#> 10        Chichester                 6638                     5122
#> 11        Colchester                  192                      143
#> 12          Coventry                  177                      134
#> 13             Derby                 3442                     2816
#> 14         Doncaster                  369                      305
#> 15            Durham                 1865                      653
#> 16           ENGLAND                45981                    35564
#>    Total.accidents.Conveyed Emergency.accidents Emergency.accidents.attended
#> 1                      1798                2775                         2127
#> 2                       673                 855                          699
#> 3                       875                1127                          947
#> 4                      1703                2950                         2186
#> 5                      1364                2055                         1591
#> 6                      2302                4069                         3087
#> 7                      5763                8503                         6376
#> 8                      1888                2438                         2022
#> 9                      3078                4452                         3338
#> 10                     3560                5826                         4290
#> 11                      128                  96                           76
#> 12                      114                  96                           77
#> 13                     1815                2929                         2300
#> 14                      244                 194                          164
#> 15                      436                1771                          571
#> 16                    25741               40136                        29851
#>    Emergency.accidents.conveyed
#> 1                          1471
#> 2                           484
#> 3                           711
#> 4                          1392
#> 5                          1130
#> 6                          1880
#> 7                          4688
#> 8                          1358
#> 9                          2455
#> 10                         2851
#> 11                           59
#> 12                           63
#> 13                         1285
#> 14                          124
#> 15                          384
#> 16                        20335

^{Created on 2023-04-26 with reprex v2.0.2}

nirgrahamuk · April 26, 2023, 8:55am

I would do it like this :

library(tidyverse)

(long_1 <- pivot_longer(accidents, cols = -City) |>
  mutate(name = str_replace(name, fixed("..."), "@")) |>
  separate(col = "name", sep = "@", into = letters[1:2]))

(summary_2 <- group_by(long_1,
                       City,
                       a) |> summarise(value=sum(value))|>ungroup())

(result_3 <- pivot_wider(summary_2,
                         id_cols="City",
                         names_from="a",
                         values_from="value"))

campbellg · April 26, 2023, 2:32pm

Thanks for this. Apologies but I don't think I have communicated what it is I am trying to do.

I don't want to sum across the 6 columns. I just want to re-organise the original data frame so that all entries for 'total accidents' are in one column, all entries for 'total accidents attended' are in another' and so on. Does that make sense?

nirgrahamuk · April 26, 2023, 2:39pm

So each city would have 6 rows ?

campbellg · April 26, 2023, 2:41pm

Yes that's right (they are for separate dates which I will add in - did not include that here).

nirgrahamuk · April 26, 2023, 2:51pm

The repetitions are 3 not 6 , so each city would have 6 rows, like this :


library(tidyverse)

(long_1 <- pivot_longer(accidents, cols = -City) |>
    mutate(name = str_replace(name, fixed("..."), "@")) |>
    separate(col = "name", sep = "@", into = letters[1:2]))

(reform_2 <- group_by(long_1,
                       City,
                       a) |> mutate(rn=row_number()) |>
              ungroup())

(result_3 <- pivot_wider(reform_2,
                         id_cols=c("City","rn"),
                         names_from="a",
                         values_from="value"))

campbellg · April 26, 2023, 3:04pm

Thanks very much. I will look at your solution soon.

I managed to get it working using your previous solution (by filtering through the different column names in 'long_1', making a new_df for each one, then merging them.

Andrzej · April 27, 2023, 5:27am

It gives me an error:

Error in `dplyr::mutate()`:
ℹ In argument: `rn = ungroup(row_number())`.
ℹ In group 1: `City = "Birmingham"`, `a = "Emergency accidents"`.
Caused by error in `UseMethod()`:
! no applicable method for 'ungroup' applied to an object of class "c('integer', 'numeric')"

Why is that ?

nirgrahamuk · April 27, 2023, 7:01am

Your code seems different?

system · May 18, 2023, 7:02am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.