Joining time-series data with different dimensions and variables

Hi, I have two dataframes that I want to join. The first dataset has growth time series for two different sites, each site having two individuals. The second dataset has temperature time series for the two sites. I want to join both of these to have one table with climate and growth time series for each individual at each site. Here is a screenshot to show what I want my final joined data to look like.

image

I am guessing dplyr is the easiest way to go, but I am at a complete loss for how to do this. Thanks a lot for reading this and I'd appreciate any advice!

#growth data
#individuals from site one
site1ind1<-c(2,2.2,3.2,1)
site1ind2<-c(1,1.2,2.2,1.2)
#individuals from site two
site2ind1<-c(5,1,3.2,1)
site2ind2<-c(1,2,3,1.2)
df<-data.frame(site1ind1,site1ind2,site2ind1,site2ind2)
row.names(df)<-c(2000,2001,2002,2003)

#climate data
year<-c(2000,2001,2002,2003,2000,2001,2002,2003)
site<-c("site1","site1","site1","site1","site2","site2","site2","site2")
temp<-c(19,28,22,21,18,30,18,22)
climate<-data.frame(year,temp,site)

So like this?

df %>%
    mutate(year = as.numeric(row.names(.))) %>%
    pivot_longer(all_of(colnames(df))) %>%
    mutate(site = unlist(str_split(name, "ind"))[1]) %>%
    mutate(individual = unlist(str_split(name, "site[0-9]"))[2])%>%
    select(-name)%>%
    mutate(growth = value) %>%
    left_join(climate)
1 Like

Thanks a lot for the quick response. It's close, but when I try your code I get this:

I can't figure out why the output only has site1 and individual1. Any ideas?

Edit: I used the following packages to run your code:
library(dplyr)
library(tidyr)
library(stringr)

Ah. It's using the first value I suspect in str_split.
I think a rowwise will fix it. Will post a solution later

1 Like

Final solution:

library(dplyr)
library(tidyr)
library(stringr)

year<-c(2000,2001,2002,2003,2000,2001,2002,2003)
site<-c("site1","site1","site1","site1","site2","site2","site2","site2")
temp<-c(19,28,22,21,18,30,18,22)
climate<-data.frame(year,temp,site)

df %>%
  mutate(year = as.numeric(row.names(.))) %>%
  pivot_longer(all_of(colnames(df))) %>%
  rowwise() %>%
  mutate(site = unlist(str_split(name, "ind"))[1]) %>%
  mutate(individual = unlist(str_split(name, "site[0-9]"))[2])%>%
  select(-name)%>%
  mutate(growth = value) %>%
  left_join(climate)

1 Like

Hi again! When I run the code I am still getting this:

image

With just site 1 and individual 1.

It seems things go weird in the "mutate(site = unlist(str_split(name, "ind"))[1])" line. Any ideas what's going wrong?

Thanks again!

Did you add the rowwise?

Yes, I ran this code:

library(dplyr)
library(tidyr)
library(stringr)

#growth data
#individuals from site one
site1ind1<-c(2,2.2,3.2,1)
site1ind2<-c(1,1.2,2.2,1.2)
#individuals from site two
site2ind1<-c(5,1,3.2,1)
site2ind2<-c(1,2,3,1.2)
df<-data.frame(site1ind1,site1ind2,site2ind1,site2ind2)
row.names(df)<-c(2000,2001,2002,2003)

#climate
year<-c(2000,2001,2002,2003,2000,2001,2002,2003)
site<-c("site1","site1","site1","site1","site2","site2","site2","site2")
temp<-c(19,28,22,21,18,30,18,22)
climate<-data.frame(year,temp,site)

df %>%
mutate(year = as.numeric(row.names(.))) %>%
pivot_longer(all_of(colnames(df))) %>%
rowwise() %>%
mutate(site = unlist(str_split(name, "ind"))[1]) %>%
mutate(individual = unlist(str_split(name, "site[0-9]"))[2])%>%
select(-name)%>%
mutate(growth = value) %>%
left_join(climate)

Does it work okay for you?

This is my output:

> library(dplyr)
> library(tidyr)
> library(stringr)
> 
> year<-c(2000,2001,2002,2003,2000,2001,2002,2003)
> site<-c("site1","site1","site1","site1","site2","site2","site2","site2")
> temp<-c(19,28,22,21,18,30,18,22)
> climate<-data.frame(year,temp,site)
> 
> df %>%
+   mutate(year = as.numeric(row.names(.))) %>%
+   pivot_longer(all_of(colnames(df))) %>%
+   rowwise() %>%
+   mutate(site = unlist(str_split(name, "ind"))[1]) %>%
+   mutate(individual = unlist(str_split(name, "site[0-9]"))[2])%>%
+   select(-name)%>%
+   mutate(growth = value) %>%
+   left_join(climate)
Joining, by = c("year", "site")
# A tibble: 16 x 6
# Rowwise: 
    year value site  individual growth  temp
   <dbl> <dbl> <chr> <chr>       <dbl> <dbl>
 1  2000   2   site1 ind1          2      19
 2  2000   1   site1 ind2          1      19
 3  2000   5   site2 ind1          5      18
 4  2000   1   site2 ind2          1      18
 5  2001   2.2 site1 ind1          2.2    28
 6  2001   1.2 site1 ind2          1.2    28
 7  2001   1   site2 ind1          1      30
 8  2001   2   site2 ind2          2      30
 9  2002   3.2 site1 ind1          3.2    22
10  2002   2.2 site1 ind2          2.2    22
11  2002   3.2 site2 ind1          3.2    18
12  2002   3   site2 ind2          3      18
13  2003   1   site1 ind1          1      21
14  2003   1.2 site1 ind2          1.2    21
15  2003   1   site2 ind1          1      22
16  2003   1.2 site2 ind2          1.2    22

So is it a version issue?

version.string R version 4.0.3 (2020-10-10)

packageVersion("dplyr")
[1] ‘1.0.3’
packageVersion("tidyr")
[1] ‘1.1.2’
packageVersion("stringr")
[1] ‘1.4.0’

Hi again, I updated my packages and now it works perfectly! Thanks so much, you've just brought an end to weeks of frustration :slight_smile:

1 Like

Why left-join() rather than full_join() ? The full_join() would keep all columns in both dataframes.

In the reprex (unless I'm mistaken) there is no change in the result?

Yes, when I replace left_join with full_join I get the same output for this example.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.