Cannot get na.locf to work

icruick · May 25, 2022, 4:53pm

I'm having an issue filling in NA's with the value to the left.

The job at hand means I have to read in a csv with over a million records. I then have to create a cumulative total per week by city. I have got to the stage where i have done cumsum and pivot_wider.

My current output is as follows

||City|05/03/2020|12/03/2020|19/03/2020|26/03/2020|..........
|---|---|---|---|---|---|
|1|Hull|NA|16776|17599|17840|
|2|Chester|NA|2665|2693|NA|
|3|Wakefield|NA|2594|2854|3081|
|4|Cardiff|NA|NA|NA|NA|
|5|Greenock|450|864|2237|2301|
|6|Inverness|NA|23957|23960|NA|
|7|London|2282|NA|4930|5342|

However for the purposes of the output I need it to look like the following.

||City|05/03/2020|12/03/2020|19/03/2020|26/03/2020|..........
|---|---|---|---|---|---|
|1|Hull|NA|16776|17599|17840|
|2|Chester|NA|2665|2693|2693|
|3|Wakefield|NA|2594|2854|3081|
|4|Cardiff|NA|NA|NA|NA|
|5|Greenock|450|864|2237|2301|
|6|Inverness|NA|23957|23960|23960|
|7|London|2282|2282|4930|5342|

The towns/cites always remain the same. The dates for col names go up to today and will update on a weekly basis. I'm happy with the first NA's in a row remaining but once the first value appears in a row, the remaining NA's need to be filled with the value to its left.

Iv tried variations of na.locf and na.locf0 with no success.

if its of any use - the section of code that got it to this stage is as follow s

mutate(CumulativeSum = cumsum(RFS_count))%>%
group_by(City, Reporting_Week)%>%
summarise(n = n(),
Cumulativetotal = sum(CumulativeSum))%>%
group_by(City)%>%
arrange(Reporting_Week)%>%
mutate(TotalSum = cumsum(n))%>%
select(-n, -Cumulativetotal)%>%
pivot_wider(names_from=Reporting_Week, values_from=TotalSum)

It was a struggle to get it even to this stage so i know the above is probably longwinded and inefficient

Any help would be much appreciated

Sanjmeh · May 26, 2022, 3:44am

Hi
Your question is rather simple but you have made it complicated by including unnecessary details.
Have you read what a reprex is? If not please read here FAQ: What's a reproducible example (`reprex`) and how do I create one?
It will help you ask questions in a way that peoiple can instantly help you.
For your current question the two major improvements should be:

Use code formatting.
Reduce the problem to a minimim (use a toy data instead of all columns of your real data)

Best wishes
S

nirgrahamuk · May 26, 2022, 2:13pm

I could do more to help you after you follow the other posters advice about a reprex; but it seems to me the best time to tackle unwanted NA's per city is immediately before you pivot wider.

library(tidyverse)

df <- tibble(
  group = c(1,1,2,2),
  value = c(1, NA, 3, NA)
)
df

(step1 <- group_by(df,
         group) %>% mutate(rep_week=row_number()) %>% 
  fill(value) %>% ungroup())

step1 %>% pivot_wider(names_from="rep_week",
                      values_from="value")

icruick · May 26, 2022, 8:10pm

Hi,

Thanks for your reply.

I've tried my best to simplify the ask and use code formatting.

So the Raw data I receive only has two headers and would look similar to the below.

df <- tibble(
Report_date = c("01-01-2022", "01-01-2022", "01-01-2022", "01-01-2022", "01-01-2022", "02-01-2022", "02-01-2022", "02-01-2022", "02-01-2022", "02-02-2022", "03-01-2022", "03-01-2022", "03-01-2022", "03-01-2022", "03-01-2022"),
City = c("London","Edinburgh","Edinburgh","Chester", "Wick", "London", "London", "Brighton", "Chester", "Swindon", "Edinburgh", "Wick", "Wick", "Brighton", "Swindon")

)
df

Report_date City

1 01-01-2022 London
2 01-01-2022 Edinburgh
3 01-01-2022 Edinburgh
4 01-01-2022 Chester
5 01-01-2022 Wick
6 02-01-2022 London
7 02-01-2022 London
8 02-01-2022 Brighton
9 02-01-2022 Chester
10 02-02-2022 Swindon
11 03-01-2022 Edinburgh
12 03-01-2022 Wick
13 03-01-2022 Wick
14 03-01-2022 Brighton
15 03-01-2022 Swindon

What i am looking to achieve is have the rows as the cities, the columns as the dates, and the values as the running cumulative total of City. So it will look something like this

City 01-01-2022 02-01-2022 03-01-2022

1 Brighton NA 1 2
2 Chester 1 2 2
3 Edinburgh 2 3 4
4 London 1 3 3
5 Swindon NA 1 2
6 Wick 1 1 4

My attempt to get this to work is

df3 <- df %>%
mutate(numeric_count = 1)%>%
group_by(City)%>%
arrange(Report_date)%>%
mutate(CumulativeSum = cumsum(numeric_count))%>%
group_by(City, Report_date)%>%
summarise(n = n(),
Cumulativetotal = sum(CumulativeSum))%>%
group_by(City)%>%
arrange(Report_date)%>%
mutate(TotalSum = cumsum(n))%>%
select(-n, -Cumulativetotal)%>%
group_by(Report_date)%>%
pivot_wider(names_from=Report_date, values_from=TotalSum)%>%
arrange(City)

This gives me the following output

City 01-01-2022 02-01-2022 02-02-2022 03-01-2022

1 Brighton NA 1 NA 2
2 Chester 1 2 NA NA
3 Edinburgh 2 NA NA 3
4 London 1 3 NA NA
5 Swindon NA NA 1 2
6 Wick 1 NA NA 3

But I essentially need the NA's to be filled with the value to the left. its ok if the NA's before the first value remain though

system · June 16, 2022, 8:10pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.