# mutate new values in df based on condition that relates to different df

Hello everyone,

I am currently involved in a tiny research project and I am facing a mutate-problem with dplyr relatively early in the process. I am trying to calculate the percentage of of a sub-group on a main group. The data looks like this:

Dataset 1:

area_id status count

1 01001 1 93928
2 01001 2 99368
3 01001 3 74463
4 01002 1 215774
5 01002 2 218672
6 01002 3 192941
7 01003 1 181151
8 01003 2 192095
9 01003 3 180232

Dataset 2

are_id count

1 01001 267759
2 01002 627387
3 01003 553478
4 01004 243065
5 01051 331176

So as you can see, table 1 splits the data of table 2 into various sub-groups, depending on a variable called "status". Therefore, table 1 has three times as much as rows as table 2. My goal is to compute a new column in table 1 that contains the percentage of each subgroup on the total count per area unit that is only included in table 2. I want it to look like this:
area_id status count percentage

1 01001 1 93928 0.3507967986
2 01001 2 99368 0.3711098413
3 01001 3 74463 etc.

I tried this, but it is not working:

table1 %>%
mutate(percentage=ifelse(table1\$area_id == table2\$area_id, table1\$count/table2\$count, "FALSE")

Can somebody help me on this one?

Thanks a lot.

Can you provide minimal reproducible data for your case?

Just refer to https://reprex.tidyverse.org/
and https://reprex.tidyverse.org/

In general problems involving relationships between two or more tables are approached by applying an appropriate table join.

You don't need table2 at all, you can calculate all you need directly from table1

``````library(dplyr)

# Sample data on a copy/paste friendly format, replace this with your own data frame
table1 <- data.frame(
stringsAsFactors = FALSE,
area_id = c("01001","01001","01001",
"01002","01002","01002","01003","01003","01003"),
status = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
count = c(93928,99368,74463,215774,
218672,192941,181151,192095,180232)
)

# Relevant code
table1 %>%
group_by(area_id) %>%
mutate(percentage = count / sum(count))
#> # A tibble: 9 × 4
#> # Groups:   area_id [3]
#>   area_id status  count percentage
#>   <chr>    <dbl>  <dbl>      <dbl>
#> 1 01001        1  93928      0.351
#> 2 01001        2  99368      0.371
#> 3 01001        3  74463      0.278
#> 4 01002        1 215774      0.344
#> 5 01002        2 218672      0.349
#> 6 01002        3 192941      0.308
#> 7 01003        1 181151      0.327
#> 8 01003        2 192095      0.347
#> 9 01003        3 180232      0.326
``````

Created on 2022-04-12 by the reprex package (v2.0.1)

Note: Next time please provide a proper REPRoducible EXample (reprex) illustrating your issue.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.