Combining two pieces of data

I am currently trying to complete my google analytics capstone project however I want to see the connection of how sleep effects the amount of steps an individual takes.

I want to get the steps to be matched up on the same rows as the individuals(Id) on the same day(SleepDay and ActivityDay matched up).

Manually I can do what I am envisioning however I was wondering if there was a faster way with R that I can do to matching up the steps individuals have taken on a day to the amount of sleep recorded.


I am starting by using a IF statement to showcase what I'm trying to achieve with this data. I want A and B to match with F and G and to add steps data (H) alongside the rest of the data. Essentially I want to manipulate the data to find out the amount of hours and steps individuals took/got in the day.


*Here's the correct photo of the IF statement that I'm starting out with.

It would help if you would show us a little bit of the original data, preferably using dput() rather than a photo.

Sorry I'm still new to R, had to youtube how to use dput. Would this work?

structure(list(Id = c(1503960366, 1503960366, 1503960366, 1503960366,
1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 1503960366,
1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 1503960366,
1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 1503960366,
1503960366, 1503960366, 1503960366, 1644430081, 1644430081, 1644430081,
1644430081, 1844505072), SleepDay = c("4/12/2016 0:00", "4/13/2016 0:00",
"4/15/2016 0:00", "4/16/2016 0:00", "4/17/2016 0:00", "4/19/2016 0:00",
"4/20/2016 0:00", "4/21/2016 0:00", "4/23/2016 0:00", "4/24/2016 0:00",
"4/25/2016 0:00", "4/26/2016 0:00", "4/28/2016 0:00", "4/29/2016 0:00",
"4/30/2016 0:00", "5/1/2016 0:00", "5/2/2016 0:00", "5/3/2016 0:00",
"5/5/2016 0:00", "5/6/2016 0:00", "5/7/2016 0:00", "5/8/2016 0:00",
"5/9/2016 0:00", "5/10/2016 0:00", "5/11/2016 0:00", "4/29/2016 0:00",
"4/30/2016 0:00", "5/2/2016 0:00", "5/8/2016 0:00", "4/15/2016 0:00"
), TotalSleepRecords = c(1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), TotalMinutesAsleep = c(327L, 384L, 412L,
340L, 700L, 304L, 360L, 325L, 361L, 430L, 277L, 245L, 366L, 341L,
404L, 369L, 277L, 273L, 247L, 334L, 331L, 594L, 338L, 383L, 285L,
119L, 124L, 796L, 137L, 644L), TotalTimeInBed = c(346L, 407L,
442L, 367L, 712L, 320L, 377L, 364L, 384L, 449L, 323L, 274L, 393L,
354L, 425L, 396L, 309L, 296L, 264L, 367L, 349L, 611L, 342L, 403L,
306L, 127L, 142L, 961L, 154L, 961L), X = c(NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA), Id.1 = c(1503960366, 1503960366,
1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 1503960366,
1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 1503960366,
1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 1503960366,
1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 1503960366,
1503960366, 1503960366, 1503960366, 1503960366), ActivityDate = c("4/12/2016",
"4/13/2016", "4/14/2016", "4/15/2016", "4/16/2016", "4/17/2016",
"4/18/2016", "4/19/2016", "4/20/2016", "4/21/2016", "4/22/2016",
"4/23/2016", "4/24/2016", "4/25/2016", "4/26/2016", "4/27/2016",
"4/28/2016", "4/29/2016", "4/30/2016", "5/1/2016", "5/2/2016",
"5/3/2016", "5/4/2016", "5/5/2016", "5/6/2016", "5/7/2016", "5/8/2016",
"5/9/2016", "5/10/2016", "5/11/2016"), TotalSteps = c(13162L,
10735L, 10460L, 9762L, 12669L, 9705L, 13019L, 15506L, 10544L,
9819L, 12764L, 14371L, 10039L, 15355L, 13755L, 18134L, 13154L,
11181L, 14673L, 10602L, 14727L, 15103L, 11100L, 14070L, 12159L,
11992L, 10060L, 12022L, 12207L, 12770L)), row.names = c(NA, 30L
), class = "data.frame")

I wounded up manually filling in the data and used my IF statements in excel to verify my manual copying and pasting. So I can show a picture of how I want it to end up.

I would still like to know how I could reorganize this cluttered data in R for future references.

The following code seems to work, although I haven't checked it carefully. The steps are

  1. Make a new dataframe with just he Id, ActivityDate, and TotalSteps.
  2. Convert both SleepDay and ActivityDate to dates rather than character strings.
  3. Do a left join, matching on Id and the sleep date and the activity date.

Assume the original data is in the dataframe df. (Which you supplied perfectly.)

library(tidyverse)
library(lubridate)
df2 <- df |> select(Id, ActivityDate, TotalSteps)
df2 <- df2 |> mutate(ActivityDate = mdy(ActivityDate))
df1 <-
  df |> rename(oldActivityData = ActivityDate, OldTotalSteps = TotalSteps)
df1 <- df1 |> mutate(SleepDate = mdy_hm(SleepDay))
df1 <- df1 |> left_join(df2, by = c("Id" = "Id",
                                    "SleepDate" = "ActivityDate"))
2 Likes

That seemed to have helped me! Thank you so much.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.