Associate Earliest Dates with Names

Hi, I have a dataset with names in duplicate on rows and corresponding dates on the column:

John G. 2001-03-14
Jane D. 2002-04-22
John G. 2003-06-24
Jane D. 2005-07-27

I'd like to create a new column with the earliest dates for each name. How do I do this?

library(tidyverse)
library(lubridate)

df <- tibble(
  name = c("John G.", "Jane D.", "John G.", "Jane D."),
  date = c("14/03/2001", "22/04/2002", "24/06/2003", "27/07/2005")) %>% 
  mutate(date = as_date(date, format = "%d/%m/%y")) 

df2 <- df %>% 
  group_by(name) %>% 
  arrange(date) %>% 
  slice(1) %>% 
  rename(first_date = date)

df3 <- df %>% 
  left_join(df2, by = "name")

# A tibble: 4 x 3
  name    date       first_date
  <chr>   <date>     <date>    
1 John G. 2020-03-14 2020-03-14
2 Jane D. 2020-04-22 2020-04-22
3 John G. 2020-06-24 2020-03-14
4 Jane D. 2020-07-27 2020-04-22

Here's another solution using the minimum function:

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

df <- tibble(
  name = c("John G.", "Jane D.", "John G.", "Jane D."),
  date = c("14/03/2001", "22/04/2002", "24/06/2003", "27/07/2005")) %>% 
  mutate(date = as_date(date, format = "%d/%m/%y")) 

df %>%
  group_by(name) %>%
  mutate(first_date=min(date))
#> # A tibble: 4 x 3
#> # Groups:   name [2]
#>   name    date       first_date
#>   <chr>   <date>     <date>    
#> 1 John G. 2020-03-14 2020-03-14
#> 2 Jane D. 2020-04-22 2020-04-22
#> 3 John G. 2020-06-24 2020-03-14
#> 4 Jane D. 2020-07-27 2020-04-22

Created on 2021-02-11 by the reprex package (v0.3.0)

2 Likes

@StatSteph Hi there. Thanks for your assistance. This was a huge help!

Afterwards I used a merge to bring them to be included in the larger data frame.

Cheers!

@williaml Thanks for your assistance. This was a huge help!

Afterwards I used a merge to bring them to be included in the larger data frame.

Cheers!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.