How to recode a factor in order to make a pie chart

#1

I have a column (in a dataframe, with each row represent a paper published) called Publisher. It's a factor. I used this code to run a frequency table:
table(data$Publisher) %>%
sort(decreasing=TRUE)

I got a frequency table like this:
BMC
28
PloS
18
Springer Nature
9
Elsevier
5
BMJ
3
(here I omitted many other values with frequency less than 3)

I'm trying to regroup all the publishers with value less than 3 into "Other", because I'm hoping to group those publishers with which not many papers have been published into Other in a pie chart. Otherwise the pie chart would get too crowded and not meaningful. But I'm stuck here with recoding Publisher. Can anybody help?

0 Likes

#2

You can use fct_lump() or fct_other() from forcats package

image

If you need more specific help, please turn this into a self-contained REPRoducible EXample (reprex) A reprex makes it much easier for others to understand your issue and figure out how to help.

If you've never heard of a reprex before, you might want to start by reading this FAQ:

0 Likes

#3

Thanks so much for your help! I've never heard of forecats but it sounds exactly what I need! I'll try it out.
Thanks also for the info about Reproducible example. I'll use it if I still can't figure this issue out.

0 Likes

#4

It worked! Here is my code:

data%>% 
    mutate(Publisher=fct_lump(Publisher, n=6))%>%
    count(Publisher)

Thanks again!

1 Like

#5

If your question's been answered (even by you!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it:

0 Likes

#6

done! thanks for the reminder! And many thanks for introducing me to forcats!

1 Like

closed #7

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

0 Likes