ggplot y axis totals from data frame editing

Newbie here....in ggplot I am coming across a thing I haven't been able to remedy. Code is below with dataframe screenshotted that I am mapping to.

How can I edit it so the Y axis does NOT TOTAL UP NumSP column for each team? For example, the 1997 ANA used 11 NumSP for the season. That is the only number (11) I want in Y axis on ggplots, not multiplying 11 x 11 for each pitcher that distinctly pitched on that team for a total of 121 which it is summing in my code below for y axis.

I realize my dataframe is off and how could I use dplyr to select only a row from each team and year, uniquely? or is there a way to do it in ggplot with the stat = "identity" area?

appreciate any guidance!

'''ggplot(data = FinalGS, mapping = aes(x = yearID, y = NumSP, color = teamID))+
geom_bar(stat = "identity")+
labs(title = "Amount of Total Starters Used per Team, 1990-2021")+
ylab ("Number of Starting Pitchers")+
xlab("Year")+
theme(axis.title.y = element_text(color="#993333", size=13, face="bold"))+
theme(axis.title.x = element_text(color="#993333", size=13, face="bold"))+
theme(plot.title = element_text(color="Dark Red", size=14, face="bold.italic"))+
theme(axis.text.x = element_text(color = "dark red", size = 9, face ="bold"))'''

Hey,

welcome in the forum. Please provide your data (e.g. your data from the screenshot) as reproducible example, so that other users can easier reproduce your problem and help finding a solution.

Check this for make a reproducible example. It is important for find a better help of all community.

Other way is paste the result when you put this in the console.

dput(FinalGS [1:40 , ] )

Thanks for that...here is more of my reproducible code below...

The issue is on the FinalGS DataFrame with screenshot below....it lists EVERY playerID that appeared in a game for that team. NumSP is the column I need to y axis of ggplots...how do I reduce every unique playerID to just ONE row per team in the FinalGS dataframe?

library(Lahman)
library(tidyverse)
library(dplyr)
library(tidyr)
library(purrr)
library(ggrepel)
View(LahmanData)
View(Pitching)
View(Teams)
Totals = merge(Teams, Pitching, by=c("yearID","teamID"))
View(Totals)

#To see how many total pitchers had a GS on the 2011 Milwaukee Brewers, 6 total
MIL <- filter(Totals, yearID == 2011, teamID == "MIL", GS > 0)
View(MIL)
#This below lists any pitcher with a GS on a club from 1990-2021, throwing 2022 out as the season is not complete
GSPitching1 <- filter(Totals, yearID < 2022, yearID >1989)
as_tibble(GSPitching1)
View(GSPitching1)

#This below is using dplyr group by and summarize to get INDIVIDUAL arms that has a GS in TeamGSUSE dataFrame
teamGSUSE <- GSPitching1 %>% 
  select(yearID, teamID, playerID, G.x, GS, W.x, L.x, ERA.x) 
View(teamGSUSE)             
TeamGS <- GSPitching1 %>% 
  group_by(yearID, teamID) %>% 
  summarise(TGS = sum(GS))
head(TeamGS)
View(TeamGS)
head(teamGSUSE)
View(teamGSUSE)
library(utils)
##The issue here is bringing TGS from TeamGS over to teamGSUSE b/c there are differing column totals
#for total number of teams and total number of pitchers. e.g. 13 man pitching staffs for 1 team
#solved w/ join on 2 variables below
library(base)
#This below works to merge them ON TWO COLUMNS
merged <-merge(TeamGS, teamGSUSE, by = c('teamID', 'yearID'))

View(merged)

#This below creates a new DataFrame where a pitcher started a game/had a GS in that season, throws out 100% relief pitchers
GSmerged <- filter(merged, GS > 0)
View(GSmerged)

#Stumbled here and figured it out (hat tip to "ML")
# By TEAM and YEAR. Then COUNT N() 
#Pipe then summarize number starts = n (). This works below!
#n() counts up the values into a new column
newmerged <- GSmerged %>% 
  group_by(teamID, yearID, ) %>% 
  summarise(NumSP = n())
View(newmerged)

#This merged the last two DataFrames into one for analysis and ggplot use with team stats and individual GS numbers by staff

FinalGS <-merge(newmerged, GSmerged, by = c('teamID', 'yearID'))

View(FinalGS)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.