How to compare two Data Frames to Add Missing Data based on Specific Condition

Hello - I'm new to coding and trying to write a loop (not sure if that's the best way). I've tried various ways but still haven't found a solution.

Using the example data shown below, I want my loop to:

  1. Look at S$ID.

  2. Find all S$ID that match and:
    A1) Compare all Label_Chr recorded for that ID with the Master_DataFrame$Label_Chr
    A2) Identify all those that are missing.
    A3) For those that are missing add a new row to Data.Frame S, filled with: The Bird_ID, Missing Label_Chr and Columns Duration and Count as 0's.

  3. Keep looping until all matched groups are complete

    Below are examples of the data:

    Data.Frame S
    ID Label_Chr Duration Count
    OP_Pen12_R_30mins Walk. 3.310 1
    OP_Pen12_R_30mins Run. 0.250 1
    OP_Pen12_R_30mins Talk. 1.960 1
    OP_Pen12_R_Immediate Sit. 1.830 1
    OP_Pen12_R_Immediate Talk. 3.850 2

    Master_DataFrame
    Label_Chr. Duration Count
    Walk. 0 0
    Run 0 0
    Sit 0 0
    Talk 0 0

    The end S data frame would look like this:
    ID Label_Chr Duration Count
    OP_Pen12_R_30mins Walk. 3.310 1
    OP_Pen12_R_30mins Run. 0.250 1
    OP_Pen12_R_30mins Talk. 1.960 1
    OP_Pen12_R_30mins Sit. 0 0

    OP_Pen12_R_Immediate Sit. 1.830 1
    OP_Pen12_R_Immediate Talk. 3.850 2
    OP_Pen12_R_Immediate Walk. 0 0
    OP_Pen12_R_Immediate Run. 0 0

I've tried various things such as:

  k <- 1
    for(i in 1:length(S))
      for (ii in 1:length(MasterBehaviours))
      {Bird_Comparison <- strcmp(S[i,1], S[i+1,1])
      Behaviour_Comparison <- strcmp(S[i,3], MasterBehaviours[ii+1,1]) }
    if(Bird_Comparison == TRUE & Behaviour_Comparison == FALSE) # {print(Bird_Comparison) 
    {ZZ <- rbind(S, MasterBehaviours[1+k,1])}
    print(ZZ)}


    for(i in 1:length(S[,]))
    {x <- S[i,1]
    y <- S[i+1,1]
    labelCheck <- (x==y)}
    if(labelCheck == TRUE)
    {Behaviour_Comparison <- strcmp(S[i+1,3], MasterBehaviours[1+k,1])}
    if (Behaviour_Comparison == FALSE)
    {rbind(S, MasterBehaviours[1+k,1])
      print(S)}

I've been working on this same section for almost 3 weeks whilst trying to learn R so if anyone has any tips or suggested solutions I could try I'd really appreciate :slight_smile: Thank you!

The problem with providing example in this way are that forum users can not copy and paste the example and directly use them , but would have to go through additional steps. To get the best help, it's best to remove as many barriers as possible to others to work with what you have to help you. Therefore you can use datapasta package or base function dput() to represent data examples in a directly runnable way.
FAQ: How to do a minimal reproducible example ( reprex ) for beginners

1 Like

This would be much simpler than a for loop (Note: Please notice the way I'm posting the answer, that would be a proper reproducible example and you should make your questions providing one in the future).

library(tidyverse)

# Sample data on a copy/paste friendly format
s <- data.frame(
  stringsAsFactors = FALSE,
                ID = c("OP_Pen12_R_30mins",
                       "OP_Pen12_R_30mins","OP_Pen12_R_30mins","OP_Pen12_R_Immediate",
                       "OP_Pen12_R_Immediate"),
         Label_Chr = c("Walk", "Run", "Talk", "Sit", "Talk"),
          Duration = c(3.31, 0.25, 1.96, 1.83, 3.85),
             Count = c(1L, 1L, 1L, 1L, 2L)
)

master <- data.frame(
  stringsAsFactors = FALSE,
         Label_Chr = c("Walk", "Run", "Sit", "Talk"),
          Duration = c(0, 0, 0, 0),
             Count = c(0, 0, 0, 0)
)

s %>% 
    mutate(Label_Chr = factor(Label_Chr, levels = master$Label_Chr)) %>% 
    group_by(ID) %>% 
    complete(Label_Chr) %>% 
    replace_na(list(Duration = 0, Count = 0))
#> # A tibble: 8 x 4
#> # Groups:   ID [2]
#>   ID                   Label_Chr Duration Count
#>   <chr>                <fct>        <dbl> <dbl>
#> 1 OP_Pen12_R_30mins    Walk          3.31     1
#> 2 OP_Pen12_R_30mins    Run           0.25     1
#> 3 OP_Pen12_R_30mins    Sit           0        0
#> 4 OP_Pen12_R_30mins    Talk          1.96     1
#> 5 OP_Pen12_R_Immediate Walk          0        0
#> 6 OP_Pen12_R_Immediate Run           0        0
#> 7 OP_Pen12_R_Immediate Sit           1.83     1
#> 8 OP_Pen12_R_Immediate Talk          3.85     2

Created on 2020-05-02 by the reprex package (v0.3.0)

1 Like

Yes, you're right, a for loop wasn't the best way. I think it might be in future but I realise now with small data sets it isn't necessary. Thank you for your reply, and I will know for future to adapt by question so that people can run the code themselves - thanks!

I ended up writing this which worked (much longer and perhaps less efficient than yours but it works so another option):

#clear all workspace
rm(list = ls())
#Clear console <- ctrl+L

installing packages code <- install.packages("readxl")

Loading

library("readxl")
library("dplyr")
library("readr")
library("pracma")
library("arsenal")
library("compare")
library("writexl")
library("tidyr")

#Read in data sheet
A <- read_excel("Aggregated_All_Habituation_Data_R.xlsx")

Makes a list of BirdID

Bird <- data.frame(A$Bird_ID)
#Renames column to keep column name consistent as Bird_ID
Bird <- Bird%>% select(Bird_ID = A.Bird_ID)
Bird <- tbl_df(unique(A[,5]))
#Makes a list of behaviours
Behaviour_Etho <- tbl_df(unique(A[,6]))

#Make dataframe for each behaviour and which birds performed said behaviour
Preen_Sitting <- subset(A, Tier_Labels == 'Preen_Sitting')
Shuffle <- subset(A, Tier_Labels == 'Shuffle')
Sit_Inactive <- subset(A, Tier_Labels == 'Sit_Inactive')
Stand_Inactive <- subset(A, Tier_Labels == 'Stand_Inactive')
Feather_Ruffle <- subset(A, Tier_Labels == 'Feather_Ruffle')
Headshake <- subset(A, Tier_Labels == 'Headshake')
Out_of_View <- subset(A, Tier_Labels == 'Out_of_View')
Peck_Accel <- subset(A, Tier_Labels == 'Peck_Accel')
Peck_litter <- subset(A, Tier_Labels == 'Peck_litter')
Preen_Standing <- subset(A, Tier_Labels == 'Preen_Standing')
Walking <- subset(A, Tier_Labels == 'Walking')
Crawling <- subset(A, Tier_Labels == 'Crawling')
Crouch <- subset(A, Tier_Labels == 'Crouch')
Feather_peck <- subset(A, Tier_Labels == 'Feather_peck')
Reverse_Behaviour <- subset(A, Tier_Labels == 'Reverse_Behaviour')
Running <- subset(A, Tier_Labels == 'Running')
Drink_Standing <- subset(A, Tier_Labels == 'Drink_Standing')
Peck_mark <- subset(A, Tier_Labels == 'Peck_mark')
Feed_Standing <- subset(A, Tier_Labels == 'Feed_Standing')
Preen_Sidelying <- subset(A, Tier_Labels == 'Preen_Sidelying')
Hop_Jump <- subset(A, Tier_Labels == 'Hop_Jump')
Wing_flap <- subset(A, Tier_Labels == 'Wing_flap')
Foraging <- subset(A, Tier_Labels == 'Foraging')
Peck_beak <- subset(A, Tier_Labels == 'Peck_beak')
Peck_wall <- subset(A, Tier_Labels == 'Peck_wall')
Sidelying_Inactive <- subset(A, Tier_Labels == 'Sidelying_Inactive')
Head_Scratch <- subset(A, Tier_Labels == 'Head_Scratch')
Leg_Stretch <- subset(A, Tier_Labels == 'Leg_Stretch')
Self_Peck_Mark <- subset(A, Tier_Labels == 'Self_Peck_Mark')
Wing_Stretch <- subset(A, Tier_Labels == 'Wing_Stretch')
Self_Peck_Accel <- subset(A, Tier_Labels == 'Self_Peck_Accel')
Leg_Wing_Stretch <- subset(A, Tier_Labels == 'Leg&Wing_Stretch')
Wing_assisted_run <- subset(A, Tier_Labels == 'Wing_assisted_run')

#Create individual data sets for specific behaviours adding in missing behaviours to individuals
data1 <- merge(Walking, Bird, by="Bird_ID", all=T)
data1$Tier_Labels <- 'Walking'
data1$Tier_Numbers <- '1'

data2 <- merge(Preen_Standing, Bird, by="Bird_ID", all=T)
data2$Tier_Labels <- 'Preen_Standing'
data2$Tier_Numbers <- '2'

data3 <- merge(Stand_Inactive, Bird, by="Bird_ID", all=T)
data3$Tier_Labels <- 'Stand_Inactive'
data3$Tier_Numbers <- '3'

data4 <- merge(Wing_assisted_run, Bird, by="Bird_ID", all=T)
data4$Tier_Labels <- 'Wing_assisted_run'
data4$Tier_Numbers <- '4'

data5 <- merge(Shuffle, Bird, by="Bird_ID", all=T)
data5$Tier_Labels <- 'Shuffle'
data5$Tier_Numbers <- '5'

data6 <- merge(Preen_Sitting, Bird, by="Bird_ID", all=T)
data6$Tier_Labels <- 'Preen_Sitting'
data6$Tier_Numbers <- '6'

data7 <- merge(Sit_Inactive, Bird, by="Bird_ID", all=T)
data7$Tier_Labels <- 'Sit_Inactive'
data7$Tier_Numbers <- '7'

data8 <- merge(Sidelying_Inactive, Bird, by="Bird_ID", all=T)
data8$Tier_Labels <- 'Sidelying_Inactive'
data8$Tier_Numbers <- '8'

data9 <- merge(Preen_Sidelying, Bird, by="Bird_ID", all=T)
data9$Tier_Labels <- 'Preen_Sidelying'
data9$Tier_Numbers <- '9'

data10 <- merge(Reverse_Behaviour, Bird, by="Bird_ID", all=T)
data10$Tier_Labels <- 'Reverse_Behaviour'
data10$Tier_Numbers <- '10'

data11 <- merge(Crouch, Bird, by="Bird_ID", all=T)
data11$Tier_Labels <- 'Crouch'
data11$Tier_Numbers <- '11'

data12 <- merge(Peck_beak, Bird, by="Bird_ID", all=T)
data12$Tier_Labels <- 'Peck_beak'
data12$Tier_Numbers <- '12'

data13 <- merge(Running, Bird, by="Bird_ID", all=T)
data13$Tier_Labels <- 'Running'
data13$Tier_Numbers <- '13'

data14 <- merge(Feather_peck, Bird, by="Bird_ID", all=T)
data14$Tier_Labels <- 'Feather_peck'
data14$Tier_Numbers <- '14'

data15 <- merge(Peck_litter, Bird, by="Bird_ID", all=T)
data15$Tier_Labels <- 'Peck_litter'
data15$Tier_Numbers <- '15'

data16 <- merge(Headshake, Bird, by="Bird_ID", all=T)
data16$Tier_Labels <- 'Headshake'
data16$Tier_Numbers <- '16'

data17 <- merge(Feather_Ruffle, Bird, by="Bird_ID", all=T)
data17$Tier_Labels <- 'Feather_Ruffle'
data17$Tier_Numbers <- '17'

data18 <- merge(Wing_flap, Bird, by="Bird_ID", all=T)
data18$Tier_Labels <- 'Wing_flap'
data18$Tier_Numbers <- '18'

data19 <- merge(Peck_Accel, Bird, by="Bird_ID", all=T)
data19$Tier_Labels <- 'Peck_Accel'
data19$Tier_Numbers <- '19'

data20 <- merge(Leg_Stretch, Bird, by="Bird_ID", all=T)
data20$Tier_Labels <- 'Leg_Stretch'
data20$Tier_Numbers <- '20'

data21 <- merge(Feed_Standing, Bird, by="Bird_ID", all=T)
data21$Tier_Labels <- 'Feed_Standing'
data21$Tier_Numbers <- '21'

data22 <- merge(Hop_Jump, Bird, by="Bird_ID", all=T)
data22$Tier_Labels <- 'Hop_Jump'
data22$Tier_Numbers <- '22'

data23 <- merge(Head_Scratch, Bird, by="Bird_ID", all=T)
data23$Tier_Labels <- 'Head_Scratch'
data23$Tier_Numbers <- '23'

data24 <- merge(Out_of_View, Bird, by="Bird_ID", all=T)
data24$Tier_Labels <- 'Out_of_View'
data24$Tier_Numbers <- '24'

data25 <- merge(Peck_mark, Bird, by="Bird_ID", all=T)
data25$Tier_Labels <- 'Peck_mark'
data25$Tier_Numbers <- '25'

data26 <- merge(Drink_Standing, Bird, by="Bird_ID", all=T)
data26$Tier_Labels <- 'Drink_Standing'
data26$Tier_Numbers <- '26'

data27 <- merge(Self_Peck_Accel, Bird, by="Bird_ID", all=T)
data27$Tier_Labels <- 'Self_Peck_Accel'
data27$Tier_Numbers <- '27'

data28 <- merge(Self_Peck_Mark, Bird, by="Bird_ID", all=T)
data28$Tier_Labels <- 'Self_Peck_Mark'
data28$Tier_Numbers <- '28'

data29 <- merge(Leg_Wing_Stretch, Bird, by="Bird_ID", all=T)
data29$Tier_Labels <- 'Leg_Wing_Stretch'
data29$Tier_Numbers <- '29'

data30 <- merge(Wing_Stretch, Bird, by="Bird_ID", all=T)
data30$Tier_Labels <- 'Wing_Stretch'
data30$Tier_Numbers <- '30'

data31 <- merge(Crawling, Bird, by="Bird_ID", all=T)
data31$Tier_Labels <- 'Crawling'
data31$Tier_Numbers <- '31'

data32 <- merge(Peck_wall, Bird, by="Bird_ID", all=T)
data32$Tier_Labels <- 'Peck_wall'
data32$Tier_Numbers <- '32'

data33 <- merge(Foraging, Bird, by="Bird_ID", all=T)
data33$Tier_Labels <- 'Foraging'
data33$Tier_Numbers <- '33'

#Join together each individual behaviour data set
final <- rbind(data1, data2,data3,data4,data5,data6,data7, data8, data9, data10, data11, data12, data13, data14, data15, data16,
data17, data18, data19, data20, data21, data22, data23, data24, data25, data26, data27, data28, data29, data30, data31,
data32,data33)

final$Bird_ID_2 <- final$Bird_ID
final <- final%>% separate(Bird_ID_2, c('Bird','Pen','Strain','Habituation'),'_')

#Fill missing data (NA) with count 0
final$CumSum_Duration[is.na(final$CumSum_Duration)] <- 0
final$Count_Tier[is.na(final$Count_Tier)] <- 0
final$Reattached:Y/N[is.na(final$Reattached:Y/N)] <- NA

final

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.