Use ggplot() to visually display a scatter (i.e. point) plot of variables

ggplot2

#1

I have 4 variables I would like to visually display a scatter plot of and use a colour code for the points greater than 0. Here's a snapshot of the data:

|Country |Year|solar_mtoe|solar_twh|wind_mtoe|wind_twh|
|Total World|1965|0|0|0|0|
|Total World|1966|0|0|0|0|
|Total World|1967|0|0|0|0|
|Total World|1968|0|0|0|0|
|Total World|1969|0|0|0|0|
|Total World|1970|0|0|0|0|
|Total World|1971|0|0|0|0|
|Total World|1972|0|0|0|0|
|Total World|1973|0|0|0|0|
|Total World|1974|0|0|0|0|
|Total World|1975|0|0|0|0|
|Total World|1976|0|0|0|0|
|Total World|1977|0|0|0|0|
|Total World|1978|0|0|0.000678825|0.003|
|Total World|1979|0|0|0.00135765|0.006|
|Total World|1980|0|0|0.002375888|0.0105|
|Total World|1981|0|0|0.002375888|0.0105|
|Total World|1982|0|0|0.004186089|0.0185|
|Total World|1983|0.000678825|0.003|0.007420679|0.032794949|
|Total World|1984|0.001428047|0.006311111|0.010127066|0.044755556|

Country and Year are being used for x and y axis and the last 4 variables colour coded if >0

Here's my code:

Use ggplot2 to visualize Solar and Wind

geom_point<-ggplot(data=BP_Stats_Data, mapping=aes(x = Country, y= Year)) + geom_point(aes(col =(solar_mtoe)>0, (solar_twh)>0, wind_mtoe>0, wind_twh>0))

When I run this code I receive the following message:
Warning: Ignoring unknown aesthetics:

What do I have to add/amend to achieve the colour coded plot?


#2

Hi there!
I tried running your code just now, and I have a question: in your data, what is an observation, and what are the variables? (here is what I mean: https://www.jstatsoft.org/article/view/v059i10/v59i10.pdf)

If your observations and variables are lined up correctly in this table, then how would you go about color coding observations where 2 or more variables are >0 ?

Are you just trying to have a binary color palette: one color for zeros, and one for non-zeros? Do variables colors matter (i.e. does solar_mtoe have to colored differently from solar_twh etc)?


#3

While I'm not sure exactly what you want, my guess is that you could first gather all variables into 1 column, and have another column to indicate what variable that is, then create a binary color column, and then plot it, plotting the value onto color, and variable onto, let's say, shape.
You'd also want to use geom_jitter() since your x is Country and your y is Year, but you have 4 observations for each country-year pair, and thus will have 4 overlapping data points.

library(tidyverse)
BP_Stats_Data %>% 
  gather(var, value, -c(Country, Year)) %>% 
  mutate(color = if_else(value == 0, '0', '1')) %>% 
  
  ggplot(mapping=aes(x = Country, 
                                       y= Year,
                                       col = color,
                                       shape = var)) + 
  geom_jitter() +
  scale_color_manual(values = c("0" = "grey", "1" = "blue"))


You could also map var to facet_wrap() and have 4 distinct charts (1 for each variable). I think I'd personally prefer that over shape


#4

The following are variables: Country |Year|solar_mtoe|solar_twh|wind_mtoe|wind_twh|
The data listed below are the observations.

I was thinking to use a multi-color palette, one color for zeros, a second colour for both solar_mtoe and solar_twh and a third colour for wind_mtoe and wind_twh.


#5

From what your desires are (and the final outcome's use totally escapes me, but I'll trust your judgment), I'd feature engineer another column and map it to color.

Something like this:

library(tidyverse)
df_gathered <- BP_Stats_Data %>% 
  gather(var, value, -c(Country, Year)) %>% 
  mutate(color = case_when(value == 0 ~ "zero",
                           var == "solar_mtoe" | var == "solar_twh" ~ "solar",
                           var == "wind_mtoe" | var == "wind_twh" ~ "wind",
                           TRUE ~ "na"))

ggplot(df_gathered, aes(x = Country, 
                   y= Year,
                   col = color,
                   shape = var)) + 
  geom_jitter() +
  scale_color_manual(values = c("zero" = "grey40", "solar" = "orange", "wind" = "blue")) +
  theme_minimal()

## also, try facetting by variable
ggplot(df_gathered, aes(x = Country, 
                        y= Year,
                        col = color)) + 
  geom_point() +
  facet_wrap(~ var) +
  scale_color_manual(values = c("zero" = "grey40", "solar" = "orange", "wind" = "blue")) +
  theme_minimal()

#6

In the future, please try creating a minimally reproducible example via (e.g. via reprex package): you may have better luck getting more help from more users.


#7

Thank you, I didn't know I could do that.


#8

If you learn nothing else from your university class using R, you'll be winning if you learn to create a reproducible example (reprex): [Video] Reproducible Examples and the `reprex` package

You're basically asking us to do your homework for you, so the least you can do is to make it easier for us to help you by creating a well formatted reprex.


#9

it's also worth noting that the forum does have a homework policy: FAQ: Homework Policy

In short, homework inspired questions are OK, but try to parse your question down to the bit you're hung on and make a reprex of just that bit. Also make sure you format your posts so they are readable. You'll find that helpers are more forthcoming if you don't make them jump through a lot of hoops to see what you're doing (thus the reprex).


#10

Thanks, I have installed reprex and followed the video. However, it doesn't work. It seems that the copy function is not copying to the clipboard to create the markdown. The is a reprex of the error:

No user-supplied code found … so we’ve made some up. You’re welcome!

sprintf("Happy %s!", weekdays(Sys.Date()))
#> [1] "Happy Thursday!"

Created on 2018-10-18 by the reprex package (v0.2.1)


#11

If you run into problems with access to your clipboard, you can specify an infile and outfile for the reprex, and then copy and paste the contents into the forum.

reprex::reprex(input = "fruits_stringdist.R", outfile = "fruits_stringdist.md")

#12

Thank you, here's the output:

Version: 1.0
#> Error in eval(expr, envir, enclos): object 'Version' not found

RestoreWorkspace: Default
#> Error in eval(expr, envir, enclos): object 'RestoreWorkspace' not found
SaveWorkspace: Default
#> Error in eval(expr, envir, enclos): object 'SaveWorkspace' not found
AlwaysSaveHistory: Default
#> Error in eval(expr, envir, enclos): object 'AlwaysSaveHistory' not found

EnableCodeIndexing: Yes
#> Error in eval(expr, envir, enclos): object 'EnableCodeIndexing' not found
UseSpacesForTab: Yes
#> Error in eval(expr, envir, enclos): object 'UseSpacesForTab' not found
NumSpacesForTab: 2
#> Error in eval(expr, envir, enclos): object 'NumSpacesForTab' not found
Encoding: UTF-8
#> Error in eval(expr, envir, enclos): object 'UTF' not found

RnwWeave: Sweave
#> Error in eval(expr, envir, enclos): object 'RnwWeave' not found
LaTeX: pdfLaTeX
#> Error in eval(expr, envir, enclos): object 'LaTeX' not found

Created on 2018-10-18 by the reprex package (v0.2.1)


#13

I have managed to render a reprex :

# Use ggplot2 to visualize Solar and Wind 
BP_Stats_Data <- BP_Stats_Data %>% mutate(Solar = solar_mtoe + solar_twh)
#> Error in BP_Stats_Data %>% mutate(Solar = solar_mtoe + solar_twh): could not find function "%>%"

#14

You're getting this error because reprex needs to contain everything you need to reproduce the script, including the libraries. Without library(tidyverse) (or dplyr, and ggplot2, it doesn't matter), R doesn't "have" the %>% operator. The same goes for your data. Please see the aforementioned guides, which should help you with this.