I'm trying to plot a graph (from database "andmed40") that has international prostata score on y axis and age on x axis:
ggplot(data = andmed40, mapping = aes(x = Vanus, y = IPSS, group=1)) +
geom_boxplot()
why don't I get boxes for each age independently here?
how can I get same graph with boxes with different age ranges (eg 20-25, 26-30, 31-35 etc) on x axis - can I implement somehow cut function onto original code line?
With a numeric variable for the x-axis, ggplot by default treats the x variable as a single "group", so you get one boxplot that includes all the data (group=1 does the same thing). To get a boxplot at each x-value in the data, or for a ranges of x-value, you need to provide appropriate groups of the x value. Here are some examples:
# x-values form a single group
ggplot(mtcars, aes(hp, mpg)) +
geom_boxplot()
# Each unique x-value in the data is a separate group
# (you wouldn't normally do this with a numeric variable, but
# you might do this with an integer variable like age if you wanted a boxplot
# for each individual age value)
ggplot(mtcars, aes(hp, mpg, group=hp)) +
geom_boxplot()
# Set group ranges using cut and the quantile function
ggplot(mtcars, aes(cut(hp,quantile(hp, na.rm=TRUE), include.lowest=TRUE), mpg)) +
geom_boxplot()
# Set group ranges using cut with specific breakpoints
ggplot(mtcars, aes(cut(hp, seq(0,400,50), include.lowest=TRUE), mpg)) +
geom_boxplot()
On the other hand, if the x-axis variable is already categorical (that is, character class or factor class), ggplot will automatically create a separate boxplot for each category: