Need help graphing

General look of the data:

18 AM

Hi there,
I want to create a line graph from the data above where each year constitutes a different trendline.
X-axis should be the column values (1,2,3,4,5,6,7) and y-axis should be the observations in percentage. Y-axis should thus range from 1 - 100%. As there is not the same amount of observations the end of rows in year 2016 and 2018 are substituted with zero. Ideally these years trendlines end where the zeros begin.
Can anyone give me an idea how to graph this?

Picture showing the zero's:

37 AM

Hello,

Here is an example of how you could fix this

library(tidyr)
library(ggplot2)

#Create fake data
myData = as.data.frame(matrix(runif(30), nrow = 3))
myData[2, 9:10] = 0
myData[3, 7:10] = 0
myData = cbind(list(year = 2015:2017), myData)
colnames(myData)[2:11] = 1:10

#Convert data to long format
myData = myData %>% gather("x", "y", -year) 

#Set the data type for each column to get best plot
myData = myData %>% mutate(year = as.factor(year), x = as.integer(x)) 

#Ignore observations of 0.0
myData = myData %>% filter(y > 0.0)

#Plot the graph
ggplot(myData, aes(x = x, y = y, color = year)) + geom_line()


The clue in solving this is converting the data frame from wide to long format using the gather function. This way you can remove the unwanted values (0) without breaking your data frame itself.

Then you can plot using ggplot and tell it to group by year by setting the color attribute to the year (as factor).

The different steps in filtering and cleaning can be merged into one string of code like if you like (I just split them for easier understanding)

#All at once
myData = myData %>% gather("x", "y", -year) %>% 
  mutate(year = as.factor(year), x = as.integer(x)) %>%
  filter(y > 0.0)

Hope this helps,
PJ

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.