getting one row from multiple rows

aureta · July 13, 2020, 5:44pm

hi, I have a data frame with 6 variables: Date,A,B,C,D,E.
Often, dates are repeated two or three times (in two or three rows) in which for example there are values for variables A and B in one row, values for C and D in a second row and values for variable E in a third row....how to code in order to get all five values (for each variable in a single row) ? thank you very much !!!!

Date	A	B	C	D	E
21/09/2015	1	5
21/09/2015			2	4
21/09/2015					7
22/09/2015	7	6	2
22/09/2015				9	1

mfherman · July 13, 2020, 5:46pm

Can you share a sample of your data? The easiest way is to do something like:

dput(head(your_data_frame, 20))

And paste the output here.

aureta · July 13, 2020, 5:59pm

Hi, thanks for your reply...I added a sample of the df...

Leon · July 13, 2020, 6:11pm

You can do something along the lines of this:

library("tidyverse")
d <- tribble(
  ~Date,    ~A, ~B, ~C, ~D, ~E,
  "210915",	 1,  5, "", "", "",			
  "210915", "", "",	2,   4, "",	
  "210915", "", "", "", "",  7,
  "220915",  7,  6,  2, "", "",		
  "220915", "", "", "",  9,  1
)
d_clean <- d %>%
  pivot_longer(cols = -Date,
               names_to = "cat",
               values_to = "value") %>%
  filter(value != "") %>%
  pivot_wider(id_cols = Date,
              names_from = cat,
              values_from = value)

Yielding:

> d
# A tibble: 5 x 6
  Date   A     B     C     D     E    
  <chr>  <chr> <chr> <chr> <chr> <chr>
1 210915 1     5     ""    ""    ""   
2 210915 ""    ""    2     4     ""   
3 210915 ""    ""    ""    ""    7    
4 220915 7     6     2     ""    ""   
5 220915 ""    ""    ""    9     1    
> d_clean
# A tibble: 2 x 6
  Date   A     B     C     D     E    
  <chr>  <chr> <chr> <chr> <chr> <chr>
1 210915 1     5     2     4     7    
2 220915 7     6     2     9     1

Hope it helps

mfherman · July 13, 2020, 6:16pm

@Leon's answer is a great one. Here's another approach that uses the new across() function from dplyr 1.0.

library(tidyverse)

df <- structure(list(Date = structure(c(16699, 16699, 16699, 16700, 
16700), class = "Date"), A = c(1L, NA, NA, 7L, NA), B = c(5L, 
NA, NA, 6L, NA), C = c(NA, 2L, NA, 2L, NA), D = c(NA, 4L, NA, 
NA, 9L), E = c(NA, NA, 7L, NA, 1L)), row.names = c(NA, -5L), class = c("tbl_df", 
"tbl", "data.frame"))

df %>% 
  group_by(Date) %>% 
  summarize(across(.f = sum, na.rm = TRUE))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 2 x 6
#>   Date           A     B     C     D     E
#>   <date>     <int> <int> <int> <int> <int>
#> 1 2015-09-21     1     5     2     4     7
#> 2 2015-09-22     7     6     2     9     1

^{Created on 2020-07-13 by the reprex package (v0.3.0)}

aureta · July 13, 2020, 6:34pm

wonderful!!!!! works great...

HerClau · July 16, 2020, 8:42am

Here's another approach that uses the data.table:

#1
DT<-setDT(df)
 DT<-melt(DT,id=1)[value!="NA"]
DT<-dcast(DT,Date ~ variable)
DT
         Date A B C D E
1: 2015-09-21 1 5 2 4 7
2: 2015-09-22 7 6 2 9 1
#2
DT<-setDT(df)
DT[ , lapply(.SD, sum,na.rm = TRUE), by = "Date"]
         Date A B C D E
1: 2015-09-21 1 5 2 4 7
2: 2015-09-22 7 6 2 9 1

system · August 6, 2020, 8:42am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.