How can I count the amount of rows in a specific time slot?

MathieuVandamme · October 7, 2022, 1:28pm

Hi all

I have a csv file with samples and there result time. This file is 27000 rows big and the result time is of one day (0:00:00 untill 23:59:59).

I want to make a histogram that shows the distribution of the amount of samples over time. That it is visible that eg: around 14u, the most of the samples are done.

I need to have a file that says:
[0-1h] = 278 samples
[1-2h] = 28 samples
...
[14-15h] = 7096 samples
...
[23-0h] = 55 samples

This info can I place into a histogram.

How can I come from a file that is listing all the samples and there (result)time to a file as above? I need to select the samples in each hour and count them and list them in a new data.frame that I can use for the histogram.

I already read a lot online and tried a lot (a couples of hours passed by), one of the things that I did is the following:

#Calculate the amount of FirstScan samples each hour by using the While loop and a counter:
install.packages("lubridate",repos = "http://cran.us.r-project.org")
DataHourMinutesSeconds <- df1 %>%
  separate(FirstScanTime, sep = ":", into = c("Hours", "Minutes", "Seconds")) %>%
  mutate_at(c("Hours", "Minutes", "Seconds"), as.numeric)

counter=0
df3 = data.frame()

while (counter<24) {
 sum <- count(DataHourMinutesSeconds,
  filter(Hours == counter))
   df3[nrow(df3) + 1,] = c(counter, sum)  
    counter = counter+1
  
}

Thank in advance!

andresrcs · October 8, 2022, 4:30pm

Can you please share a small part of the data set in a copy-paste friendly format?

In case you don't know how to do it, there are many options, which include:

If you have stored the data set in some R object, dput function is very handy.
In case the data set is in a spreadsheet, check out the datapasta package. Take a look at this link.

EconProf · October 8, 2022, 10:43pm

First, I echo the request for an example of your data set, which makes it much easier for us to help you.

If the FirstScanTime variable is in "03:48:12" format, then the hms() and hour() functions from {lubridate} will give you the hour in numeric format.

library(lubridate)

hms("18:58:02")
#> [1] "18H 58M 2S"
hour(hms("18:58:02"))
#> [1] 18

After that, count(hours) will give the number of rows in each hour:

df1 %>% mutate(hours = hour(hms(FirstScanTime))) %>% count(hours)

It should not be necessary to download and install the {lubridate} package in your library each time you run this code. Once it is installed, library(lubridate) will load it for that R session.

MathieuVandamme · October 12, 2022, 7:57am

Hi all

Thank you for your responses. I used it as base and let it rest for some days. In my case, my solution was:

#Calculate the amount of FirstScan-samples each hour:
library(lubridate)

df3 <- as.data.frame(hour(hms(df1$FirstScanTime)))
df_aggr_First_scan <- aggregate(df3, by=list(df3$`hour(hms(df1$FirstScanTime))`), FUN = length)

Thank you

system · October 19, 2022, 7:58am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.