How can I clean this plot

Hello!
I'm experimenting with R, I'm new with it and we're currently studying it at Uni. I discovered that I really enjoy making plots on RStudio. I didn't even know what was R about until I put my hands on it and now I'm kinda obsessed haha.

I need help with this, I'm making a scatter plot with a database and I wanna get rid of that noisy and mixed up text that lays in the bottom of the graphic... This is my code, and below, the outcome plot

ggplot(data = equiposUTNFRSR, aes(...4,...5)) +
  geom_point(aes(color = ...5), size = 2, alpha = 0.8) +
  xlab('ejeX') +
  ylab('ejeY') +
  ggtitle('Estoy probando',
          subtitle = "UTN FRSR ESTADĂŤSTICA") +
  labs(caption = "fuente de los datos: Base de datos UTN estudiantes")+
  theme_minimal()

The outcome:

It seems you are making a scatter plot between two categorical variables (which is usually something you don't want to do), that is why you get the crouded x-axis, it is not clear what kind of variable you are mapping to the x-axis, because of the very un informative variable name (...4) and the unreadable tick labels. To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

It seems you are making a scatter plot between two categorical variables (which is usually something you don't want to do), that is why you get the crouded x-axis, it is not clear what kind of variable you are mapping to the x-axis, because of the very un informative variable name (...4 ) and the unreadable tick labels.

Hello, I'm using a database that isn't mine and I noticed the error in naming the columns as numbers, It's really confusing and a wrong way to do this, so I renamed the columns using dplyr rename.

This dataset contains information about groups of students (60 groups of 5-10 people each) and where they live, and I wanna see exactly that information in a plot, amount of people in those provinces and states, see the predominant one, or see how scattered the students are in those places... I don't know if I'm clear enough to explain myself...

As I'm new, I don't know which plot is ideal for which dataset... I see most of the times that picking the wrong plot/graphic is the reason of the error.
I will check datapasta tomorrow and try to do a reprex. Thank you for your info.

So far this is what i have:

Packages I'm using: ggplot2, dplyr and readxl

I hope this can give you the gap of info you needed to help me, if not, just tell me which graphic is best for categorical variables.
In this case, I'm trying to plot 2 variables, "State" and "Province".
I wanna see how many students there are in those 2 variables are and put them into a plot

Thank you Andres

You could try:
theme(axis.text.x = element_text(angle = 45, hjust = 0.95, vjust=0.95))

and you could hide the legend adding

show.legend = F

in geom_point()

1 Like

Thank you Flm, that worked perfect. Is there any way to make the Xlab font smaller?

To resize ejeX add this line in theme():
axis.title.x = element_text(size=8)

To rename x, y and legend you can use labs, for example:

... +
labs(title = "mytitle", 
          subtitle = "my sub", 
          caption = "my cap",
           x = "x-axis",
           y="y-axis", 
          color="legend title") # color or fill

p.s.:

you should also remove the first line
equiposUTNFRSR <- equiposUTNFRSR[-1,]

1 Like

You're a genius. I can't thank you enough. I'm shocked about all the things you can do with R. Stunning. I'm already working with all the help you gave me, Flm. <3

The problem is that you haven't read the data correctly from the xlsx file, which is why the column names are in the first row which will definitely cause problems down the road. I strongly recommend you to fix this issue on the reading step instead of modifying the data frame later.

I think something like this will make more sense.

library(dplyr)
library(ggplot2)

equiposUTNFRSR %>% 
    count(Province, State) %>% 
    ggplot(aes(x = Province, y = n, fill = State)) +
    geom_col(position = position_dodge())

Or even better, use a map to make a choropleth

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.