Combining values to use in graph

Hey everyone,

I combined a bunch of values from one variable into subsections. I wanted the subsections to appear on a graph (group together). What code would I use?

Ex. My data set is about dog bites in NY. The data set has a variable called Age, however, there are too many entries which makes it appear weird on the graph. My instructor told me to use the c() function to group the ages. However, I don't know how it would show up on the graph. Please help since it is for a final project!

Hi @Maddye_Perry , remember put an reproducible example for better help you.

For example, if the data frame is data

# for first 30 rows and all columns
dput(data[1:30,])

Show I was trying to graph using this code.

ggplot(data = dogbite) + 
     geom_point(mapping = aes(x = Gender, y = Age))

However, since the ages vary so much, the graph is illegible. I was told by my professor to create subgroups with the c() function. Here are the subgroups I made.

Young <- c("0.2", "0.6", "04M", "1", "1 & 3", "1 & 8", "1 1/2 YRS", "1 Y", "1 YR", "1 YR 8 Mon", "1 YRS", "1.3", "1.5", "1.6", "1.8", "1/12M", "10 M", "10 MOS", "10 MTHS", "10 MTHS &", "10 WKS", "10M", "10W", "10wks", "11 M", "11 MONS", "11 MOS", "11 MTHS", "11 WKS", "11m", "11M", "11MOS", "11MTHS", "11W", "11WKS", "12W", "12WKS", "12WKSKS", "13 WKS", "13M", "13WK", "14M", "14WKS", "15M", "16 MONS", "16 MTHS", "16M", "16W", "17M", "18 M", "18M", "18MTHS", "18W", "1M", "1Y", "1YR", "1YR 8MONS", "2", "2(2) & (1", "2 & 9MTHS", "2 1/2", "2 MONS .", "2 MTH", "2 MTHS", "2 YRS", "2-3 YR", "2-3 YRS", "2-3M", "2-3MOS", "2-3YRS", "2.5", "2.6", "21M", "22 MTHS", "2M", "2y", "2Y", "2YRS", "2YRS (MALE", "3", "3 & 4", "3 1/2 YRS", "3 M", "3 MONS", "3 MOS", "3 MTHS", "3 YR", "3 yrs", "3 YRS", "3.5", "3.6", "3M", "3MTH", "3mths", "3MTHS", "3Q", "3y", "3Y", "3YR", "3YRS", "4 M", "4 MONS", "4 MTHS", "4-6MOS", "4m", "4M", "4MO", "4MTH", "4MTHS", "5 MTHS", "5m", "5M", "6 MTHS", "6M", "6MO", "6MTH", "6MTHS", "7 M", "7 mons", "7 MOS", "7 MTHS", "7-8M", "7m", "7M", "7MOS", "7MTH", "7MTHS", "7W", "8 M", "8 MONS", "8 MOS", "8 MTHS", "8M", "8MTHS", "8W", "8WKS", "9 M", "9 MONS", "9 MTHS", "9M", "9MTHS", "9WK")
Adult <- c("4","4 YRS", "4 yrs 8 mo", "4.5", "4.6", "4y", "4Y", "5", "5 yrs", "5 YRS", "5Y", "5YR", "5YRS", "6", "6 & 4", "6 YRS", "6.5", "6.5 YRS", "6.5Y", "6", "6y", "6Y", "7", "7 YRS", "7Y", "7YRS","8","8 YRS", "8Y", "8YRS & 8 M", "9", "9 YRS", "9Y")
Elder <- c("10", "10 &9", "10yrs", "10YRS", "10.5", "10+", "10y", "10Y","11", "11YRS", "11-12YRS", "11Y", "12", "12 YRS","12Y", "13", "13 yrs", "13 YRS","13Y", "14", "14YRS", "15", "15 YRS", "15.5","16", "17", "17y", "19", "20", "21", "41", "68 yrs")

Then, I tried to code the graph with the subgroups.

ggplot(data = dogbite) + 
     geom_point(mapping = aes(x = Gender, y = Young, y = Adult, y = Elder))

However, it comes up with an error saying: Error in aes(x = Gender, y = Young, y = Adult, y = Elder) :
formal argument "y" matched by multiple actual arguments
How do I fix this and is it even possible to add c() function objects into a graph?

We don't really have enough info to help you out. Could you ask this with a minimal REPRoducible EXample (reprex)? A reprex makes it much easier for others to understand your issue and figure out how to help.

If you've never heard of a reprex before, you might want to start by reading this FAQ:

head(DOHMH_Dog_Bite_Data)
# A tibble: 6 × 9
  UniqueID DateOfBite      Species Breed Age   Gender SpayNeuter Borough
     <dbl> <chr>           <chr>   <chr> <chr> <chr>  <lgl>      <chr>  
1        1 January 01 2018 DOG     UNKN… NA    U      FALSE      Brookl…
2        2 January 04 2018 DOG     UNKN… NA    U      FALSE      Brookl…
3        3 January 06 2018 DOG     Pit … NA    U      FALSE      Brookl…
4        4 January 08 2018 DOG     Mixe… 4     M      FALSE      Brookl…
5        5 January 09 2018 DOG     Pit … NA    U      FALSE      Brookl…
6        6 January 03 2018 DOG     BASE… 4Y    M      FALSE      Brookl…
# … with 1 more variable: ZipCode <chr>
data.frame(
  stringsAsFactors = FALSE,
          UniqueID = c(1, 2, 3, 4, 5, 6),
        DateOfBite = c("January 01 2018",
                       "January 04 2018","January 06 2018","January 08 2018",
                       "January 09 2018","January 03 2018"),
           Species = c("DOG", "DOG", "DOG", "DOG", "DOG", "DOG"),
             Breed = c("UNKNOWN","UNKNOWN",
                       "Pit Bull","Mixed/Other","Pit Bull","BASENJI"),
               Age = c(NA, NA, NA, "4", NA, "4Y"),
            Gender = c("U", "U", "U", "M", "U", "M"),
        SpayNeuter = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE),
           Borough = c("Brooklyn","Brooklyn",
                       "Brooklyn","Brooklyn","Brooklyn","Brooklyn"),
           ZipCode = c("11220", NA, "11224", "11231", "11224", "11231")
)

Is this it? I'm at a very beginner level, so I'm not sure if I have done it correctly.

You are including too few rows and it is not clear what kind of plot you are expecting to get from dots among categorical variables. This would be the result and it doesn't make much sense.

library(tidyverse)

dogbite <- data.frame(
    stringsAsFactors = FALSE,
    UniqueID = c(1, 2, 3, 4, 5, 6),
    DateOfBite = c("January 01 2018",
                   "January 04 2018","January 06 2018","January 08 2018",
                   "January 09 2018","January 03 2018"),
    Species = c("DOG", "DOG", "DOG", "DOG", "DOG", "DOG"),
    Breed = c("UNKNOWN","UNKNOWN",
              "Pit Bull","Mixed/Other","Pit Bull","BASENJI"),
    Age = c(NA, NA, NA, "4", NA, "4Y"),
    Gender = c("U", "U", "U", "M", "U", "M"),
    SpayNeuter = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE),
    Borough = c("Brooklyn","Brooklyn",
                "Brooklyn","Brooklyn","Brooklyn","Brooklyn"),
    ZipCode = c("11220", NA, "11224", "11231", "11224", "11231")
)

Young <- c("0.2", "0.6", "04M", "1", "1 & 3", "1 & 8", "1 1/2 YRS",
           "1 Y", "1 YR", "1 YR 8 Mon", "1 YRS", "1.3", "1.5", "1.6",
           "1.8", "1/12M", "10 M", "10 MOS", "10 MTHS", "10 MTHS &",
           "10 WKS", "10M", "10W", "10wks", "11 M", "11 MONS", "11 MOS",
           "11 MTHS", "11 WKS", "11m", "11M", "11MOS", "11MTHS", "11W",
           "11WKS", "12W", "12WKS", "12WKSKS", "13 WKS", "13M", "13WK",
           "14M", "14WKS", "15M", "16 MONS", "16 MTHS", "16M", "16W",
           "17M", "18 M", "18M", "18MTHS", "18W", "1M", "1Y", "1YR",
           "1YR 8MONS", "2", "2(2) & (1", "2 & 9MTHS", "2 1/2", "2 MONS .",
           "2 MTH", "2 MTHS", "2 YRS", "2-3 YR", "2-3 YRS", "2-3M", "2-3MOS",
           "2-3YRS", "2.5", "2.6", "21M", "22 MTHS", "2M", "2y", "2Y", "2YRS",
           "2YRS (MALE", "3", "3 & 4", "3 1/2 YRS", "3 M", "3 MONS", "3 MOS",
           "3 MTHS", "3 YR", "3 yrs", "3 YRS", "3.5", "3.6", "3M", "3MTH",
           "3mths", "3MTHS", "3Q", "3y", "3Y", "3YR", "3YRS", "4 M", "4 MONS",
           "4 MTHS", "4-6MOS", "4m", "4M", "4MO", "4MTH", "4MTHS", "5 MTHS",
           "5m", "5M", "6 MTHS", "6M", "6MO", "6MTH", "6MTHS", "7 M", "7 mons",
           "7 MOS", "7 MTHS", "7-8M", "7m", "7M", "7MOS", "7MTH", "7MTHS",
           "7W", "8 M", "8 MONS", "8 MOS", "8 MTHS", "8M", "8MTHS", "8W",
           "8WKS", "9 M", "9 MONS", "9 MTHS", "9M", "9MTHS", "9WK")
Adult <- c("4","4 YRS", "4 yrs 8 mo", "4.5", "4.6", "4y", "4Y", "5", "5 yrs",
           "5 YRS", "5Y", "5YR", "5YRS", "6", "6 & 4", "6 YRS", "6.5",
           "6.5 YRS", "6.5Y", "6", "6y", "6Y", "7", "7 YRS", "7Y", "7YRS","8",
           "8 YRS", "8Y", "8YRS & 8 M", "9", "9 YRS", "9Y")
Elder <- c("10", "10 &9", "10yrs", "10YRS", "10.5", "10+", "10y", "10Y","11",
           "11YRS", "11-12YRS", "11Y", "12", "12 YRS","12Y", "13", "13 yrs",
           "13 YRS","13Y", "14", "14YRS", "15", "15 YRS", "15.5","16", "17",
           "17y", "19", "20", "21", "41", "68 yrs")

dogbite %>% 
    mutate(Age = case_when(
        Age %in% Young ~ "Young",
        Age %in% Adult ~ "Adult",
        Age %in% Elder ~ "Elder"
    )) %>% 
    ggplot(aes(Gender, Age)) +
    geom_point()

Created on 2022-06-09 by the reprex package (v2.0.1)

Can you please clarify?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.