pivot_wider() or something else?

I am trying to make df_raw wide

> df_raw
# A tibble: 12 x 2
   X1             X2                                                                                
   <chr>          <chr>                                                                             
 1 "Trading Code~ "CNW"                                                                             
 2 "News Title:"  "CNW: Q3 Financials of NRB Commercial Bank Limited"                               
 3 "News:"        "(Continuation news of CNW: Q3 Financials of NRB Commercial Bank Limited): Pre-IP~
 4 "Post Date:"   "2021-03-21"                                                                      
 5 ""             ""                                                                                
 6 ""             ""                                                                                
 7 "Trading Code~ "CNW"                                                                             
 8 "News Title:"  "CNW: Q3 Financials of NRB Commercial Bank Limited"                               
 9 "News:"        "(Continuation news of CNW: Q3 Financials of NRB Commercial Bank Limited): Pre-IP~
10 "Post Date:"   "2021-03-21"                                                                      
11 ""             ""                                                                                
12 ""             ""  

I have tried the following code:

df_raw %>%
  filter(X1 != "") %>%
  mutate(id = row_number()) %>% # unique identifier
  pivot_wider(names_from = X1,
              values_from = X2)
# A tibble: 8 x 5
     id `Trading Code:` `News Title:`             `News:`                               `Post Date:`
  <int> <chr>           <chr>                     <chr>                                 <chr>       
1     1 CNW             NA                        NA                                    NA          
2     2 NA              CNW: Q3 Financials of NR~ NA                                    NA          
3     3 NA              NA                        (Continuation news of CNW: Q3 Financ~ NA          
4     4 NA              NA                        NA                                    2021-03-21  
5     5 CNW             NA                        NA                                    NA          
6     6 NA              CNW: Q3 Financials of NR~ NA                                    NA          
7     7 NA              NA                        (Continuation news of CNW: Q3 Financ~ NA          
8     8 NA              NA                        NA                                    2021-03-21  

Desired output:

# A tibble: 2 x 4
  `Trading Code:` `News Title:`             `News:`                               `Post Date:`
  <chr>           <chr>                     <chr>                                 <chr>       
1 CNW             CNW: Q3 Financials of NR~ (Continuation news of CNW: Q3 Financ~ 2021-03-21  
2 CNW             CNW: Q3 Financials of NR~ (Continuation news of CNW: Q3 Financ~ 2021-03-21  

Data

df_raw <- structure(list(X1 = c("Trading Code:", "News Title:", "News:", 
"Post Date:", "", "", "Trading Code:", "News Title:", "News:", 
"Post Date:", "", ""), X2 = c("CNW", "CNW: Q3 Financials of NRB Commercial Bank Limited", 
"(Continuation news of CNW: Q3 Financials of NRB Commercial Bank Limited): Pre-IPO weighted average paid-up number of shares for 3 (three) months (July-September 2020) and 9 (nine) months (January-September 2020) period ended on 30 September 2020 was 582,516,998 which was same for the same period of the previous year and Post-IPO paid-up number of shares would be 702,516,998. (end)", 
"2021-03-21", "", "", "CNW", "CNW: Q3 Financials of NRB Commercial Bank Limited", 
"(Continuation news of CNW: Q3 Financials of NRB Commercial Bank Limited): Pre-IPO Net Asset Value (NAV) per share with revaluation (considering Pre-IPO paid up shares) would be Tk. 16.01 as on 30 September 2020 and the same would be Tk. 14.98 (considering Post-IPO paid up shares). (cont. 3)", 
"2021-03-21", "", "")), row.names = c(NA, -12L), class = c("tbl_df", 
"tbl", "data.frame"))

Key to prevent the NA's is to assign the same unique ID to all related rows. I've added some comments where the code has been modified:

library(tidyverse)

df_raw <- tibble(X1 = c("Trading Code:", "News Title:", "News:", "Post Date:", "", "",
                        "Trading Code:", "News Title:", "News:", "Post Date:", "", ""
                        ),
                 X2 = c("CNW", "CNW: Q3 Financials of NRB Commercial Bank Limited",
                        "(Continuation news of CNW: Q3 Financials of NRB Commercial Bank Limited): Pre-IPO weighted average paid-up number of shares for 3 (three) months (July-September 2020) and 9 (nine) months (January-September 2020) period ended on 30 September 2020 was 582,516,998 which was same for the same period of the previous year and Post-IPO paid-up number of shares would be 702,516,998. (end)",
                        "2021-03-21", "", "", "CNW", "CNW: Q3 Financials of NRB Commercial Bank Limited",
                        "(Continuation news of CNW: Q3 Financials of NRB Commercial Bank Limited): Pre-IPO Net Asset Value (NAV) per share with revaluation (considering Pre-IPO paid up shares) would be Tk. 16.01 as on 30 September 2020 and the same would be Tk. 14.98 (considering Post-IPO paid up shares). (cont. 3)",
                        "2021-03-21", "", ""
                        )
                 )

df_raw %>%
  filter(X1 != "") %>%
  ## assign a (temporary) unique identifier to each trading code ..
  mutate(id = ifelse(X1 == "Trading Code:", row_number(), NA) ) %>% 
  ## .. and assign it also to the consecutive rows
  fill(id) %>% 
  pivot_wider(names_from = X1,
              values_from = X2,
              ) %>% 
  ## remove the temporary id
  select(-id)
#> # A tibble: 2 x 4
#>   `Trading Code:` `News Title:`           `News:`                   `Post Date:`
#>   <chr>           <chr>                   <chr>                     <chr>       
#> 1 CNW             CNW: Q3 Financials of ~ (Continuation news of CN~ 2021-03-21  
#> 2 CNW             CNW: Q3 Financials of ~ (Continuation news of CN~ 2021-03-21

Created on 2021-03-21 by the reprex package (v1.0.0)

Reading again the vignette("pivot"), section "Contact list" explains that there's a more concise approach with cumsum().
And if you wish to improve your column names, then it's a one-line job with the janitor package.

df_raw %>%
  filter(X1 != "") %>%
  ## assign a (temporary) unique identifier to each trading code ..
  mutate(id = cumsum(X1 == "Trading Code:")) %>% 
  pivot_wider(names_from = X1,
              values_from = X2,
              ) %>% 
  ## remove the temporary id
  select(-id) %>%
  ## clean the column names w/ the janitor package
  janitor::clean_names()

Many thanks for the elegant solution! Learned some cool tricks :smile: Thank you again!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.