How can I create a matrix by group

rxtheodoro · August 7, 2020, 5:23pm

Sorry, I don't know how to generalize my question or improve the title.

But, my problem is:

I have this structure

firm <- c(11212, 11212,11212,11212, 2632, 2632, 3552, 4218, 536, 6667, 6667, 6667, 7278, 7278)
id_p_s <- c(1, 2, 2,2, 1, 2, 1, 1, 1, 1, 2, 2, 1, 2)
state <- c("SP", "MG","RJ","RJ", "PI", "ES","PI", "PI", "ES", "ES",
               "SP", "SP", "MG", "ES")
enterprises <- data.frame(firm = firm, id_p_s = id_p_s,
                       state = state)

If the firm is a parent, then id_p_s==1 otherwise is 2 . Every firm has 1 parent and can has 0, 1 or more subsidiaries, which can be in the same or in another State.

I need to see how many firms (unique code) has in the state and the states it has subsidiaries (unique code peer firm).

Something like this:

      SP  MG  RJ  ES  PI
SP    1   1    1   0   0
MG    0   1    0   1   0
RJ    0   0    0   0   0
ES    1   0    0   2   0
PI    0   0    0   1   3

PI has 3 parent firms, but only one has subsidiary in ES. In SP I have 1 parent firm with subsidiary in MG and RJ.

Thank so much for you atention!

AlexisW · August 7, 2020, 6:38pm

You can use left_join() to attach a column with the state of the parent. Then all you have to do is count the pairs and reshape to get the kind of table you want.

library(tidyverse)
enter_parents <- enterprises %>%
  filter(id_p_s == 1)

left_join(enterprises, enter_parents, by="firm") %>%
  rename(enter_state = state.x,
         parent_state = state.y) %>%
  group_by(enter_state, parent_state) %>%
  summarize(n=n()) %>%
  ungroup() %>%
  pivot_wider(names_from = enter_state, values_from = n) %>%
  replace_na(replace = set_names(as.list(rep(0L, ncol(.))), colnames(.)))

Or a base R solution (edit: corrected mistake)

enterprises2 <- merge(enterprises, enterprises[enterprises$id_p_s == 1,], by="firm", all.x=TRUE)
table(enterprises2$state.y, enterprises2$state.x)

AlexisW · August 7, 2020, 6:51pm

There is one discrepancy with your data: firm 6667 is twice with a subsidiary in SP. You didn't count it in your question. You can remove it before the merges using dplyr::distinct() with the tidyverse or enterprises <- enterprises[! duplicated(enterprises), ] in base R.


enterprises %>%
  distinct() %>%
  left_join(enter_parents, by="firm") %>%
  rename(enter_state = state.x,
         parent_state = state.y) %>%
  group_by(enter_state, parent_state) %>%
  summarize(n=n()) %>%
  ungroup() %>%
  pivot_wider(names_from = enter_state, values_from = n) %>%
  replace_na(replace = set_names(as.list(rep(0L, ncol(.))), colnames(.)))

# With base R
enterprises2 <- enterprises[! duplicated(enterprises),]
enterprises3 <- merge(enterprises2, enterprises2[enterprises2$id_p_s == 1,], by="firm", all.x=TRUE)
table(enterprises3$state.y, enterprises3$state.x)

rxtheodoro · August 7, 2020, 8:40pm

Sorry! I forgot.

Thanks for the advice !

And many thanks for the code, helped a lot!

But the results still not right in my real data. When I use the dplyr solution, the diagonal values are a few bigger and the row names are in different position than the column names.

When I use the base R , the names are in order, but the diagonal values are also a few bigger.

I can't find the difference.

EDIT ---
I did a modification and the diagonal values and row order is now right!

enterprises %>%  select(firm,state) %>%
  distinct() %>%
  left_join(enter_parents, by="firm") %>%
  rename(enter_state = state.x,
         parent_state = state.y) %>%
  group_by(enter_state, parent_state) %>%
  summarize(n=n()) %>%
  ungroup() %>%
  pivot_wider(names_from = enter_state, values_from = n) %>%
  replace_na(replace = set_names(as.list(rep(0L, ncol(.))), colnames(.))) %>% arrange(parent_state)

AlexisW · August 8, 2020, 10:16pm

If you want the lines with 0 companies (e.g. RJ), you can also use dplyr::complete() before the summarizing. arrange() is perfect for reordering the rows.

Apart from the order, I don't think the values on the diagonal are different from your example.

rxtheodoro · August 10, 2020, 4:53pm

Thanks again for the complete()!

Yes, worked fine in the example. I don't know what change to the real data
Is the same columns.

system · August 17, 2020, 4:53pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.