creating multiple columns from one column in a large dataset

I am new to coding in rstudio and would like to separate the species into categories and compare them by Sepal.Length on the y-axis and Sepal.Width on the x-axis for the sake of a scatterplot. What is the best way to approach and practice something like this?

head(iris)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 Virginica
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 Versicolor
#> 5 5.0 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 Virginica

Hi @correo_e, welcome to RStudio Community.

It would really help if you could post a reproducible example (or reprex) so that we have an idea of what your data looks like. Please see this thread for instructions on how to prepare one.

@siddharthprabhu
I updated the post. Please let me know if the update is acceptable.

Thank you. Using built-in datasets to illustrate your problem is great because it saves others the effort of trying to recreate your data.

There are many ways you could go about this; here are two options using the ggplot2 package:

  1. Show all observations on a single plot and use colour to differentiate the species.
library(ggplot2)

ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length)) +
  geom_point(aes(colour = Species))

Created on 2020-04-23 by the reprex package (v0.3.0)

  1. Show each species in its own plot (facet).
library(ggplot2)

ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length)) +
  geom_point() + 
  facet_wrap(~ Species)

Created on 2020-04-23 by the reprex package (v0.3.0)

You can study such examples from this book:

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.