Separating data

Richard_ML · May 13, 2022, 4:41pm

I need to be able to either separate my data frame into multiple tables based on the time stamp (into individual days), how would I do that.

I have multiple data points throughout multiple days.

dvetsch75 · May 13, 2022, 8:16pm

I suppose it would depend on what you plan on doing next... for starters, I would recommend taking a look at dplyr::group_split, but maybe if you could provide a MWE that would make it easier to come up with something more specific?

Richard_ML · May 16, 2022, 12:30pm

I sadly am not able to copy paste anything into here because I can only open this website on my phone for some reason.

Basically i'll have a timestamp (date, hour, minutes, seconds) and a few more columns. My .CSV files span for roughly a week at a time, so I'd like to be able to separate these tables into smaller tables to then do the sum of the other columns (per day) and other mathematical stuff like counting instances of change

dvetsch75 · May 16, 2022, 4:37pm

I'm still a little unclear on what exactly you need - so here are a few general methods that I think would get you close to what you are trying to do? I am using this as a dummy dataset to stand in for your real data:


library(dplyr)
library(lubridate)
library(purrr)

# Generating some timestamps
ts <- expand.grid(
    2021,
    1:6,
    1:6,
    1:6,
    1:3,
    1:2
) %>% 
    rowwise() %>% 
    mutate(
        timestamp = paste0(
            Var1,
            '-',
            Var2,
            '-',
            Var3,
            ' ',
            Var4,
            ':',
            Var5,
            ':',
            Var6
        ) %>% 
            ymd_hms
    ) %>% 
    pull(timestamp)

# Some dummy data
df <- data.frame(
    'timestamp' = ymd_hms(ts),
    replicate(10, runif(length(ts)))
)

1. Extract the day, then split on day, apply functions, and combine.


df %>% 
    mutate(
        dt_by_day = ISOdate(
            year = year(timestamp),
            month = month(timestamp),
            day = day(timestamp)
        )
    ) %>% 
    select(-timestamp) %>% 
    group_by(dt_by_day) %>% 
    summarize(
        across(
            everything(),
            list(
                mean = mean,
                sd = sd
            )
        )
    )

If you need to do something with more side effectrs, then you might want to look at dplyr::group_split. For example, this would write out data summaries for each Friday. There might be easier ways to do this, but without a sample of your data this would at least work.

df %>% 
    mutate(
        timestamp = ISOdate(
            year = year(timestamp),
            month = month(timestamp),
            day = day(timestamp)
        )
    ) %>% 
    group_split(timestamp) %>% 
    map(
        .f = function(df) {
            dt <- df$timestamp[1]
            day_of_week <- weekdays(dt)
            if(day_of_week == 'Friday') {
                tmp <- df %>% 
                    summarize(
                        across(
                            where(is.numeric),
                            .fns = list(
                                mean = mean,
                                sd = sd
                            )
                        )
                    )
                write.csv(tmp, paste0('data_', dt, '.csv'))
                    
            }
        }
    )

system · June 6, 2022, 4:38pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.