How to create a readable plot in R with 10000+ values in a dataframe

How to create a readable and legible plot in R with 10k+ values.I have a dataframe with 17298 records.There are two columns:Machine Name(Character) and Region(Character).So i want to create a readable plot with region in x axis and machine name in y axis.How do i do that using ggplot or some other way.Please help.

Can you share your code and plots that you've made so far, and why they don't do what you're looking for? I'm guessing things aren't legible because there are too many names all aligned together. Is that the problem?

How many different values are there for Machine Name and for Region?

As a suggestion, when I'm making plots that show a lot of data (e.g, 80 scatterplot facets, each with ~1000 datapoints) I tend to make a HUGE image by ggsave(your_plot, height = 30, width = 30) and inspect it myself. Those kinds of images I tend to keep to myself because they are really hard to interpret without a lot of explanation. Perhaps making the image big can help, if you just need something for yourself.

2 Likes

These are the two codes i have used so far:-
ggplot(df3_machine_region,aes(Region,Machine.Name)) +
geom_count()
!

1st Plot|690x375
ggplot(df3_machine_region,aes(Region,Machine.Name)) +
geom_jitter(aes(colour=Region))

I have to present the plot to my stakeholders,so thats why its required in a readable and legible way.

I have attached the output plots for your reference.Please find below a snapshot of data for your reference.

|Machine.Name|Region|
|0460-EPBS1.sga-res.com|Europe|
|04821-EABS1.sga-res.com|Europe|
|10429-EDABS1.sga-res.com|Europe|
|1042619-ESWEBS1.sga-res.com|Europe|
|ABE-L-98769.europe.shell.com|Americas|
|AB-L-98769.europe.shell.com|APAC|
|AB-L-98769.europe.shell.com|Europe|
|ABE-L-98769.europe.shell.com (2)|Americas|
|ABE-L-98769.europe.shell.com (2)|Europe|
|ABE-L-98840.europe.shell.com|Americas|
|AB-L-98840.europe.shell.com|APAC|
|ABE-L-98840.europe.shell.com|Europe|
|AB-L-98854.europe.shell.com|Americas|
|ABE-L-98854.europe.shell.com|Europe|
|ABE-L-98862.europe.shell.com|Americas|

This is now much more in the domain of data visualization than R programming, but we can try. What points do you want to communicate to your stakeholders with the graphs you want to make?

So basically i want to show the number of machines per region.This is just one of the requirements.There are few other columns also.The requirement is similar for them too.Its just the number of records for these combinations that is posing a problem in the plot creation,layout,readability,legibility,understanding and thereby explanation.

So a few thousand machine names and corresponding scatter points can be replaced with a single bar of a bar chart over regions.

But how is that feasible since points(machines) would get crowded over regions having more values and thereby making it less readable,legible and appealing in terms of layout.

The point is. Don't use points.

I tried to do bar plot but it shows something like this especially at the beginning,can you help me with a better code relatively which would show the data in the plot kind of clearly and be legible.

ggplot(df3_machine_region,aes(Region,Machine.Name)) +
geom_bar(stat="identity", fill="steelblue")+
theme_minimal()

I have attached the output.

ggplot(df3_machine_region,aes(x=Region)) + 
  geom_bar(fill="steelblue")+
  theme_minimal()

Have you thought about using the bar graph with a log scale on the y-axis? There are a few ways you could do this, but try first adding scale_y_log10() to your bar graph, and see how that looks.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.