# Creating a loop to filter data and create a summary

I want to create a loop which can filter data according to c("published" ,"pending","not designed") and then calculate the percentage distribution between text of Name column and create a list of summaries.

``````df <- data.frame(Name = c("ABC","DCA","GOL",NA,"MNA",NA,"VAN","KDA","JHA","MNA","LKO","HUN","GOL","DCA","JHA"),
Goal =c("published","pending","not designed","published","not designed","not designed","pending","pending","published","pending","not designed","pending","pending","pending","not designed"),
Target_1 = c(3734,2639,2604,NA,2793,2688,2403,7612,8653,8653,8765,3645,5976,4362,7593),
Target_2 = c(3322,2016,2310,NA,3236,3898,2309,5632,7846,5863,5936,4067,6876,6876,5582),
Target_3 = c(3785,2585,3750,NA,2781,3589,2830,6785,8636,7548,9065,7954,8576,9989,4892))

``````

what have you tried ?

i am not able to make logic how to do

what does ` calculate the percentage distribution between text of Name column` mean ?

``````I assume you meant this:
library(tidyverse)

(master_vol <- group_by(df,
Goal) %>% summarise(n_master=n()))
(sub_vol <- group_by(df,
Goal,Name) %>%
summarise(n_sub=n()))

inner_join(master_vol,sub_vol) %>% mutate(n_perc = n_sub/n_master)
``````
``````# A tibble: 14 x 5
Goal         n_master Name  n_sub n_perc
<fct>           <int> <fct> <int>  <dbl>
1 not designed        5 GOL       1  0.2
2 not designed        5 JHA       1  0.2
3 not designed        5 LKO       1  0.2
4 not designed        5 MNA       1  0.2
5 not designed        5 NA        1  0.2
6 pending             7 DCA       2  0.286
7 pending             7 GOL       1  0.143
8 pending             7 HUN       1  0.143
9 pending             7 KDA       1  0.143
10 pending             7 MNA       1  0.143
11 pending             7 VAN       1  0.143
12 published           3 ABC       1  0.333
13 published           3 JHA       1  0.333
14 published           3 NA        1  0.333``````

actually i want to apply loop here because i want to create a function with loop so that i can apply that function to list of different variables and by percentage distribution i mean a table like below for all three subset filtered data.

so the loop will filtered the data for not designed and create a summary like below.

list of summaries

not designed n percent
JHA 22 44%
LKO 12 24%
MNA 6 12%
NA 4 8%
DCA 6 12%
50 100%
pending
JHA 5 18%
LKO 12 43%
MNA 4 14%
NA 5 18%
DCA 2 7%
28 100%
published
JHA 1 7%
LKO 3 20%
MNA 5 33%
NA 2 13%
DCA 4 27%
15 100%

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.