# How can I create a matrix by group

Sorry, I don't know how to generalize my question or improve the title.

But, my problem is:

I have this structure

``````firm <- c(11212, 11212,11212,11212, 2632, 2632, 3552, 4218, 536, 6667, 6667, 6667, 7278, 7278)
id_p_s <- c(1, 2, 2,2, 1, 2, 1, 1, 1, 1, 2, 2, 1, 2)
state <- c("SP", "MG","RJ","RJ", "PI", "ES","PI", "PI", "ES", "ES",
"SP", "SP", "MG", "ES")
enterprises <- data.frame(firm = firm, id_p_s = id_p_s,
state = state)
``````

If the firm is a parent, then `id_p_s==1` otherwise is `2` . Every firm has 1 parent and can has 0, 1 or more subsidiaries, which can be in the same or in another State.

I need to see how many firms (unique code) has in the state and the states it has subsidiaries (unique code peer firm).

Something like this:

``````      SP  MG  RJ  ES  PI
SP    1   1    1   0   0
MG    0   1    0   1   0
RJ    0   0    0   0   0
ES    1   0    0   2   0
PI    0   0    0   1   3
``````

PI has 3 parent firms, but only one has subsidiary in ES. In SP I have 1 parent firm with subsidiary in MG and RJ.

Thank so much for you atention!

You can use `left_join()` to attach a column with the state of the parent. Then all you have to do is count the pairs and reshape to get the kind of table you want.

``````library(tidyverse)
enter_parents <- enterprises %>%
filter(id_p_s == 1)

left_join(enterprises, enter_parents, by="firm") %>%
rename(enter_state = state.x,
parent_state = state.y) %>%
group_by(enter_state, parent_state) %>%
summarize(n=n()) %>%
ungroup() %>%
pivot_wider(names_from = enter_state, values_from = n) %>%
replace_na(replace = set_names(as.list(rep(0L, ncol(.))), colnames(.)))
``````

Or a base R solution (edit: corrected mistake)

``````enterprises2 <- merge(enterprises, enterprises[enterprises\$id_p_s == 1,], by="firm", all.x=TRUE)
table(enterprises2\$state.y, enterprises2\$state.x)
``````
2 Likes

There is one discrepancy with your data: firm `6667` is twice with a subsidiary in `SP`. You didn't count it in your question. You can remove it before the merges using `dplyr::distinct()` with the tidyverse or `enterprises <- enterprises[! duplicated(enterprises), ]` in base R.

``````
enterprises %>%
distinct() %>%
left_join(enter_parents, by="firm") %>%
rename(enter_state = state.x,
parent_state = state.y) %>%
group_by(enter_state, parent_state) %>%
summarize(n=n()) %>%
ungroup() %>%
pivot_wider(names_from = enter_state, values_from = n) %>%
replace_na(replace = set_names(as.list(rep(0L, ncol(.))), colnames(.)))

# With base R
enterprises2 <- enterprises[! duplicated(enterprises),]
enterprises3 <- merge(enterprises2, enterprises2[enterprises2\$id_p_s == 1,], by="firm", all.x=TRUE)
table(enterprises3\$state.y, enterprises3\$state.x)
``````
1 Like

Sorry! I forgot. And many thanks for the code, helped a lot!

But the results still not right in my real data. When I use the `dplyr` solution, the diagonal values are a few bigger and the row names are in different position than the column names.

When I use the `base R` , the names are in order, but the diagonal values are also a few bigger.

I can't find the difference.

EDIT ---
I did a modification and the diagonal values and row order is now right!

``````enterprises %>%  select(firm,state) %>%
distinct() %>%
left_join(enter_parents, by="firm") %>%
rename(enter_state = state.x,
parent_state = state.y) %>%
group_by(enter_state, parent_state) %>%
summarize(n=n()) %>%
ungroup() %>%
pivot_wider(names_from = enter_state, values_from = n) %>%
replace_na(replace = set_names(as.list(rep(0L, ncol(.))), colnames(.))) %>% arrange(parent_state)
``````

If you want the lines with 0 companies (e.g. RJ), you can also use `dplyr::complete()` before the summarizing. `arrange()` is perfect for reordering the rows.

Apart from the order, I don't think the values on the diagonal are different from your example.

1 Like

Thanks again for the `complete()`!

Yes, worked fine in the example. I don't know what change to the real data Is the same columns.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.