I’m confused on how to combine variables

Hello! I’m new to this so I’m sorry if it’s a bit confusing.

Basically, I have a dataset where I have a ton of variables in a bunch of different columns. To make things easier to explain the only 2 columns I’m currently trying to use are “name” and “mass”. So in mass I have around 100 numbers.

I created a plot on ggplot using name as the x axis and y being mass, and used geom_point so it just shows a point where the mass of each named point is. The problem with this is there are so many points the graph looks terrible. What I want to do is create a bar graph where it says how many points fall in a certain mass. basically it would be like 4 bars and each bars height is based on the amount of points that fall into that mass. so it’s like mass 1-5 is one bar and it’s height based on the amount of named points that have a mass between 1-5, and then the second bar would be 6-10, then 11-15 and so on.

Please let me know if you know how to do this! Thank you!

Here is a simple example of making that kind of plot with geom_histogram. I invented a simple data set named DF and if the steps to do that are confusing, don't worry about them. The use of ggplot() is all you need.

library(ggplot2)
#Invent data
set.seed(123)
DF <- data.frame(Name = paste("A", 1:100, sep = "_"),
                 Mass = rnorm(100, mean = 10, sd = 3.5))
head(DF)
#>   Name      Mass
#> 1  A_1  8.038335
#> 2  A_2  9.194379
#> 3  A_3 15.455479
#> 4  A_4 10.246779
#> 5  A_5 10.452507
#> 6  A_6 16.002727
ggplot(DF, aes(Mass)) + 
  geom_histogram(binwidth = 5, fill = "steelblue", color = "white")

Created on 2022-05-06 by the reprex package (v2.0.1)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.