Population Pyramid ordering of agegroups

I am trying to create a population pyramid using the following code:

HH_rep$AGEcut <- cut(HH_rep$age,seq(0,100,5))

ggplot(data=HH_rep,aes(x= AGEcut,fill=sex)) +
geom_bar(data=subset(HH_rep,sex=="female")) +
geom_bar(data=subset(HH_rep,sex=="male"),aes(y=..count..*(-1))) +
scale_y_continuous(breaks=seq(-1000,1000,200),labels=abs(seq(-1000,1000,200))) +
coord_flip()

I get a pyramid plot ( see image) PopPyr

However each time the age-group 5-10yrs jumps into the middle of the graph instead of being in sequential order. Even after releveling the the same problem occurs. Any idea how I can overcome this issue?

Can you share your code? If I use cut() to bin an Age variable, I get the bins in the correct order.

library(ggplot2)
DF <- data.frame(Age = runif(n =500, min = 0, max = 100))
DF$Agebin <- cut(x = DF$Age, breaks = seq(0, 100, 5))
ggplot(DF, aes(Agebin)) + geom_bar() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

Created on 2020-07-02 by the reprex package (v0.3.0)

HH_rep$AGEcut <- cut(HH_rep$age,seq(0,100,5))

ggplot(data=HH_rep,aes(x= AGEcut,fill=sex)) + 
  geom_bar(data=subset(HH_rep,sex=="female")) + 
  geom_bar(data=subset(HH_rep,sex=="male"),aes(y=..count..*(-1))) + 
  scale_y_continuous(breaks=seq(-1000,1000,200),labels=abs(seq(-1000,1000,200))) + 
  coord_flip()

This code produces a correctly ordered y axis for me. Does it work for you?

library(ggplot2)
HH_rep <- data.frame(Age = runif(n =500, min = 0, max = 100),
                 sex = sample(c("female", "male"), 500, replace = TRUE))
HH_rep$AGEcut <- cut(x = HH_rep$Age, breaks = seq(0, 100, 5))


ggplot(data=HH_rep,aes(x= AGEcut,fill=sex)) + 
  geom_bar(data=subset(HH_rep,sex=="female")) + 
  geom_bar(data=subset(HH_rep,sex=="male"),aes(y=..count..*(-1))) + 
  scale_y_continuous(breaks=seq(-1000,1000,200),labels=abs(seq(-1000,1000,200))) + 
  coord_flip()

Hi FJCC,

Thanks yes it is ordered now but the structure has changed. Now(I think) it shows the the percentage of men and women in each age category when I would like the count of men and women in each age group from the whole sample. The plot should look like a pyramid with less men and women in the older age groups.

The data in my example are just random numbers so you should not worry that the shape is not what you expect. We now have to figure out why the categories are not correctly ordered with your data.
What is the result of running

str(HH_rep)

on your original HH_rep data frame?
How is HH_rep made? Do you read it in from a file?

HH_rep is read in from an excel file

When i run str(HH_rep) this is what i get for the variables of interest

age : num 44 43 18 8 13 41 32 11 5 1 ... sex : chr "male" "female" "male" "female" ...

I cannot see how your original graph is not ordering the levels correctly. I must be missing something. Are the results of the levels() functions the same if you run

DF <- data.frame(Age = runif(n =500, min = 0, max = 100),
                 sex = sample(c("female", "male"), 500, replace = TRUE))
DF$AGEcut <- cut(x = DF$Age, breaks = seq(0, 100, 5))
levels(DF$AGEcut)

and, using your original HH_rep,

HH_rep$AGEcut <- cut(HH_rep$age,seq(0,100,5))
levels(HH_rep$AGEcut)

Hi FJCC my apologies for the delay in responding.

If I run the code you have provided above using my dataset the age grouping are in the correct order. So I suspect it is something to do with my code for the population pyramid. Do you think it is creating a mode or something in the graph? As it seems to put 5-10yrs in the middle and it is the age group with the most people in it..........

Did you try the following? What was the result?

plot2

This is what I get when I run that

I was looking for something like the following. I simply ran the code and copied what appeared in the console.

> DF <- data.frame(Age = runif(n =500, min = 0, max = 100),
+                  sex = sample(c("female", "male"), 500, replace = TRUE))
> DF$AGEcut <- cut(x = DF$Age, breaks = seq(0, 100, 5))
> levels(DF$AGEcut)
 [1] "(0,5]"    "(5,10]"   "(10,15]"  "(15,20]"  "(20,25]"  "(25,30]"  "(30,35]"  "(35,40]" 
 [9] "(40,45]"  "(45,50]"  "(50,55]"  "(55,60]"  "(60,65]"  "(65,70]"  "(70,75]"  "(75,80]" 
[17] "(80,85]"  "(85,90]"  "(90,95]"  "(95,100]"
> 

What does running this on your system produce?

HH_rep$AGEcut <- cut(HH_rep$age,seq(0,100,5))
levels(HH_rep$AGEcut)
 HH_rep  <- data.frame(age = runif(n =1000, min = 0, max = 100),
+                  sex = sample(c("female", "male"), 1000, replace = TRUE))
> HH_rep$AGEcut <- cut(x = HH_rep$age, breaks = seq(0, 100, 5))
> levels(HH_rep$AGEcut) 
 [1] "(0,5]"    "(5,10]"   "(10,15]"  "(15,20]"  "(20,25]"  "(25,30]"  "(30,35]"  "(35,40]"  "(40,45]" 
[10] "(45,50]"  "(50,55]"  "(55,60]"  "(60,65]"  "(65,70]"  "(70,75]"  "(75,80]"  "(80,85]"  "(85,90]" 
[19] "(90,95]"  "(95,100]"
> 

So the factor is ordered correctly. What happens if you run

ggplot(data=HH_rep,aes(x= AGEcut,fill=sex)) +
geom_bar(data=subset(HH_rep,sex=="female"))

If that looks correct, incrementally add features until you determine what causes the problem. I don't know what we will do then, but the problem will be better defined.

The problem arises at this point aes(y=..count..*(-1))

Please try this simplified version of the plotting code. I will try to reproduce your problem tomorrow but I am out of time today.

FEMALES <- subset(HH_rep,sex=="female")
MALES <- subset(HH_rep,sex=="male")
ggplot(mapping = aes(x= AGEcut,fill=sex)) + 
  geom_bar(data = FEMALES) + 
  geom_bar(data = MALES, aes(y=..count..*(-1))) + 
  scale_y_continuous(breaks=seq(-1000,1000,200),labels=abs(seq(-1000,1000,200))) + 
  coord_flip()

Thanks it has the same problem.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.