Hi there, this would be my proposed workflow:
Step 1: count() the number of records for VendorNum
corporate.payment <- as_tibble(corporate.payment)
top_vendors <-
corporate.payment %>%
count(VendorNum, sort = TRUE) %>%
slice_max(order_by = n, n = 4)
top_vendors
#> # A tibble: 4 x 2
#> VendorNum n
#> <chr> <int>
#> 1 3630 13973
#> 2 6661 4947
#> 3 2001 4736
#> 4 4984 4321
Step 2: don't split, work in a nested data frame
results <-
corporate.payment %>%
inner_join(top_vendors, by = "VendorNum") %>%
group_nest(VendorNum, n) %>%
mutate(
ben_res = map(
.x = data,
.f = ~benford(
data = .x$Amount,
number.of.digits = 1,
discrete = TRUE,
sign = "positive"
)
),
MAD.conformity = map_chr(ben_res, pluck, "MAD.conformity")
) %>%
select(VendorNum, n, MAD.conformity)
results
#> # A tibble: 4 x 3
#> VendorNum n MAD.conformity
#> <chr> <int> <chr>
#> 1 2001 4736 Nonconformity
#> 2 3630 13973 Acceptable conformity
#> 3 4984 4321 Acceptable conformity
#> 4 6661 4947 Nonconformity
The call to inner_join() will keep columns both from the LHS and RHS of the join, but only the rows where the the keys match, this in effect filters the larger data frame by only the top (4, in this case) vendor records while simultaneously bringing in the counts.
Rather than splitting, group_nest() will allow you operate by-group like split() but in a more organized manner. This will create a new column called data, on to which we can map the benford() function and then use a combination of map_chr() and pluck() to extract the result you wanted.
Hope this helps!