Strange Behavior of group_by and summarize

I have a straight forward chunk of code:

f.fullattendee <- read_excel("Full Attendee List.xlsx") %>%
mutate(start_time = ymd_hms(starttime),
end_time = ymd_hms(endtime))
f.uniqatt <- f.fullattendee %>%
group_by(fullname) %>%
summarize(minStart = min(start_time),
maxEnd = max(end_time)) %>%
mutate(durationMin = round((maxEnd - minStart)/60, digits=2))

When running it (using tidyverse 1.3.0 and NOT running plyr), f.uniqatt contained a single record instead of the number of unique names (474 in my data).

This behavior persisted through multiple attempts until I quit and then restarted the R-studio session. After the restart, this code worked properly.

This is less of a question and more of a package stability issue. Why wouldn't this very basic tidyverse code work without a restart?
Sorry for the rant.
AB

How many levels of fullname are there? Are you sure it is 474? It is hard for us to verify without looking at the data. Either share the data or a subset of it.

f.fullattendee  %>%
  pull(gear) %>%
  unique() %>%
  length()

Thanks for the reply @StatSteph . There are 474 unique names in the dataset. Here is a redacted sample:

fullname start_time end_time
Name1 2020/10/28 13:25:09 2020/10/28 15:00:21
Name1 2020/10/28 08:34:08 2020/10/28 09:29:44
Name1 2020/10/28 10:48:53 2020/10/28 11:30:23
Name1 2020/10/28 10:33:56 2020/10/28 10:47:49
Name1 2020/10/28 11:34:32 2020/10/28 11:56:44
Name2 2020/10/28 13:00:06 2020/10/28 13:14:50
Name2 2020/10/28 08:24:25 2020/10/28 15:27:37
Name2 2020/10/28 08:24:52 2020/10/28 12:04:29
Name2 2020/10/28 14:43:40 2020/10/28 15:27:37
Name3 2020/10/28 08:24:35 2020/10/28 08:59:29
Name3 2020/10/28 09:45:39 2020/10/28 15:27:37

I suppose my question is really about the reliability of the package. As I was developing the code, things worked during the initial coding on a sample data set. Then it didn't work on the full data set. After I restatred my R session things worked again.

The only change I made between the initial coding and the runs where things didn't work was use a longer dataset. This is a confusing performance issue, certainly one that shouldn't exist. If the underlying code remains stable, shouldn't the result?

AB

When it runs with an error the first time you try it, but then after a restart, its correct every time, then the betting money is your having done something interactive in your environment that spoiled your data the first time, which you are not doing subsequently.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.