So, I am studying a period from 2015 to 2018, but there are missing values on some couples (Rif_year).
I would like to calcute the weighted mean for Var1 and Var2 with the following weights:
2018 = 0.5 -- 2017 = 0.3 -- 2016 = 0.15 -- 2015 = 0.05
For some Rif (Name2, Name3, Name4, Name5), Var1 e Var2 are missing on some years.
I would like to get all years for each Rif. Where I have the missing value, I would like to have row/rows containing Rif, the missing year/years, mode of Var1 and mode of Var2, where modes are computed on that specific Rif.
Hope to have been clear enough.
Thank you for your support!
Do you want me to explain my code? I'll try, but I'm not good at it.
From what I understand from the question, you need to summarise observations corresponding to each Rif. Hence the 1st line: group_by(Rif).
Next, the relecant records are only in the columns Var1 and Var2. So, I'll summarise observations only in those two columns. To summarise at columns of my preferences (to be specified), I've used summarise_at. Now, how shall I tell R to locate the columns I want? I'll tell it to choose those columns that starts with "Var". Hence the part: vars(starts_with(match = "Var")).
Now comes the main summarisation part. If observations on all four years are available, you'll use mean, otherwise the "mode". So, I first check whether there are 4 or less observations corresponding to each Rif group in this part: test = length(.) == 4. If there are, I'll summarise using weighted mean. Otherwise, I'll take the maximum of the observations using this part no = max(., na.rm = TRUE).
I think it's necessary to mention that for this particular problem, starts_with is unnecessary. You can simply specify the column names yourself. Also, though ifelse leads to correct (provided I understand your question correctly) results, it actually calculates weighted means and modes for both groups (both yes and no vectors are calculated), and chooses a value based on the test vector.
Does this help?
Note
I realised a mistake of my code, only after checking @jlacko's code below. I reported only the maximum value, which is not the definition of mode. But, I used this since among 4 years, it is unlikely that two or more records will be exactly same. If you want to use mode, either use @jlacko's solution, or replace this part no = max(., na.rm = TRUE) with no = sort(.)[which.max(x = table(.))].
But I agree with your observation that there is unlikely to be a modal value in less than four observations (as four observations is the maximum possible, and there would be no NAs to replace).
Mean might be a better option still, if entirely dropping NAs is not desirable.