Reformatting/Summarizing a Data Table

I want to add a column on the left with 2 rows - 1 yes, 1 no and have the total number of 1's be counted for the yes row and 0's for the no 1.

I was thinking it was pivot_wider or longer, but I can't seem to figure it out, any help would be hugely appreciated!

I think you can do this with mutate() and rowSums():

tibble(a = sample(c(0,1), replace = TRUE, size=5),
              b = sample(c(0,1), replace = TRUE, size=5),
              c = sample(c(0,1), replace = TRUE, size=5)) %>% 
  mutate(yes = rowSums(.), no = ncol(.) - rowSums(.))

The mutate() is the operative portion. That's summing all of the 1s and then, essentially, using subtraction to figure out the sum of all of the 0s.

So, you want to have "yes" and "no" columns that total each row, but also add two additional rows for "yes" and "no?" It's starting to be a non-ideal structure for a typical data frame, but it's doable (if I'm understanding correctly):


# The base tibble with the yes/no totals calculated by row
df2 <-tibble(Survival = sample(c(0,1), replace = TRUE, size=470),
             PainB4 = sample(c(0,1), replace = TRUE, size=470),
             Haem = sample(c(0,1), replace = TRUE, size=470),
             Dysp = sample(c(0,1), replace = TRUE, size=470),
             Cough = sample(c(0,1), replace = TRUE, size=470),
             Weak = sample(c(0,1), replace = TRUE, size=470),
             DM = sample(c(0,1), replace = TRUE, size=470),
             HeartA = sample(c(0,1), replace = TRUE, size=470),
             Periph = sample(c(0,1), replace = TRUE, size=470),
             Smoke = sample(c(0,1), replace = TRUE, size=470),
             Asthma = sample(c(0,1), replace = TRUE, size=470)) %>% 
  # Caclculate the total yes and no for each row
  mutate(yes = rowSums(.), no = ncol(.) - rowSums(.)) %>% 
  # move yes and no to be the first columns
  relocate(yes, no)

# Make a data frame that totals up each column and has a row for "yes"
# and a row for "no." We'll make the "yes" total and then back into the "no"
# total.

# Make the "yes" row
summary_df <- tibble(summarise(df2, across(everything(), sum))) 

# Add a "no" row by subtracting the yes row from the total rows in df2. This will
# make non-sensical counts for "yes" and "no" column totals in the second row. We'll
# address that next
summary_df <- summary_df %>% 
  bind_rows(nrow(df2) - summary_df[1,])

# We'll shuffle things around so the "total totals" are NA for the yes/no cells.
summary_df$yes[2] <- NA
summary_df$no[2] <- summary_df$no[1]
summary_df$no[1] <- NA

# Add the summary rows to the original df2
df2 <- df2 %>% 
  bind_rows(summary_df)

# Remove the temporary summary_df
rm(summary_df)

If you go to the very bottom of the data frame, you should see two rows added. the "yes" and "no" values for those two rows should total up both for the row and the column.

Oh this works!!! Thank you so much for your help!!

You're welcome! I suspect there are some more elegant approaches, but I'm glad to know this does the trick for you.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

Thank you!!!!! Do you know how I can switch it so Yes and No are rows and it just shows the grand total under each column?

df2 <-tibble(Survival = sample(c(0,1), replace = TRUE, size=470),
PainB4 = sample(c(0,1), replace = TRUE, size=470),
Haem = sample(c(0,1), replace = TRUE, size=470),
Dysp = sample(c(0,1), replace = TRUE, size=470),
Cough = sample(c(0,1), replace = TRUE, size=470),
Weak = sample(c(0,1), replace = TRUE, size=470),
DM = sample(c(0,1), replace = TRUE, size=470),
HeartA = sample(c(0,1), replace = TRUE, size=470),
Periph = sample(c(0,1), replace = TRUE, size=470),
Smoke = sample(c(0,1), replace = TRUE, size=470),
Asthma = sample(c(0,1), replace = TRUE, size=470)) %>%
mutate(yes = rowSums(.), no = ncol(.) - rowSums(.))
df2[, c(12, 13, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)]