Thank you kindly- but:
I think you're right, I might've oversimplified. x
is read in as data, in the code above I have to store it so that I can reference it in the colnames(x)
call.
I was having trouble with the nested lambda functions ~
all using .
(dot) to reference the input data, but discovered I can write out the function fully to give it another name.
The full challenge is looping over multiple datasets. In this (non-reproducible) example using map
naming the df
argument lets me reference both colnames(df)
and the column name in parse_number(.)
.
path <- fs::dir_ls("~/Downloads/", glob = "*.asp")
x <- map(path, ~ rio::import(., "html") %>%
as_tibble(.name_repair = "universal") %>%
rename_all(snakecase::to_any_case)) %>%
map(., function(df) {rename_at(df, vars(contains("rse")),
~ paste0(colnames(df)[parse_number(.) - 1], "_rse")) %>%
mutate_at(vars(-region), as.numeric)}) %>%
reduce(left_join, by = c("region", "year"))
> colnames(x)
[1] "region" "year" "off_farm_contracts"
[4] "off_farm_contracts_rse" "canola_receipts" "canola_receipts_rse"
[7] "field_peas_receipts" "field_peas_receipts_rse" "lupins_receipts"
[10] "lupins_receipts_rse" "cotton_receipts" "cotton_receipts_rse"
[13] "barley_receipts" "barley_receipts_rse" "grain_legumes_receipts"
[16] "grain_legumes_receipts_rse" "oats_receipts" "oats_receipts_rse"
[19] "off_farm_sharefarming" "off_farm_sharefarming_rse" "oilseeds_receipts"
[22] "oilseeds_receipts_rse" "rice_receipts" "rice_receipts_rse"
[25] "sorghum_receipts" "sorghum_receipts_rse" "total_crop_gross_receipts"
[28] "total_crop_gross_receipts_rse" "wheat_receipts" "wheat_receipts_rse"
[31] "total_cash_receipts" "total_cash_receipts_rse" "beef_cattle_sold"
[34] "beef_cattle_sold_rse" "sheep_sold" "sheep_sold_rse"
[37] "livestock_transfers_outward" "livestock_transfers_outward_rse" "other_farm_income"
[40] "other_farm_income_rse" "other_livestock_sold" "other_livestock_sold_rse"
[43] "total_wool_gross_receipts" "total_wool_gross_receipts_rse" "sheep_and_lambs_shorn_no"
[46] "sheep_and_lambs_shorn_no_rse" "sheep_flock_at_30_june_no" "sheep_flock_at_30_june_no_rse"
[49] "sheep_purchased_no" "sheep_purchased_no_rse" "sheep_sold_no"
[52] "sheep_sold_no_rse" "ewes_at_30_june_no" "ewes_at_30_june_no_rse"
[55] "lambs_at_30_june_no" "lambs_at_30_june_no_rse" "rams_at_30_june_no"
[58] "rams_at_30_june_no_rse" "wethers_at_30_june_no" "wethers_at_30_june_no_rse"
[61] "total_wool_sold_kg" "total_wool_sold_kg_rse" "total_wool_produced_kg"
[64] "total_wool_produced_kg_rse" "wool_cut_per_head_kg" "wool_cut_per_head_kg_rse"
Talk about impenetrable code though. Any suggestions that improve clarity are welcome.
@dromano Appreciate the suggestion, but the 30_june
suffix is really just to indicate that the data references financial years. Will definitely investigate parsing the column name when pivoting to a long format.
One last tangent- .name_repair = "universal"
strips out some units ($
and %
) from the column name. I can't pre-process them because all the rse
columns are not unique.
Is there a way to alter this behaviour? Once the symbols are gone, there's no way to infer what unit the data are in.