Error: ggplot2 doesn't know how to deal with data of class character

ggplot2

#1

I'm pretty new to R, I want to do time series on programming languages used over years
sample example of my data frame

   PL   | year
1  C    | 2007
2  C    | 2007
3  Java | 2010
4  Ruby | 2011
5  Ruby | 2011

(year is int variable)

this is my code

library(RODBC)
library(tidyr)
library(dplyr)
library(ggplot2)
conn <- odbcDriverConnect(dbConnection)
df <- sqlQuery(conn, iconv(paste(pl.sql', encoding = 'UTF-8', warn = FALSE), collapse = '\n'), from = 'UTF-8', to = 'ASCII', sub = ''))
df$Year <- as.Date(as.character(df$Year))
ggplot(df) + geom_line()

but i got this error


#2

Try ggplot(df, aes(year, group =PL)+ geom_line()

Or instead geom_bar(position ="jitter")


#3

ggplot(df,aes(Year,group= PL)) + geom_line()
still the same error

ggplot(df)+geom_bar(position = "jitter")
also same error


#4

It's not quite clear to me what you'd like to show in your plot. Perhaps a count of how many mentions there are of each PL each year? I took a stab at using data like your example data to make a plot counting how many occurrences of each PL there are by year.

The chapter introducing ggplot2 here (http://r4ds.had.co.nz/data-visualisation.html#introduction-1) has lots more examples of how it can be used. I'd also suggest the sections on data transformation and making your data "tidy" format, which makes it simpler to plot.

df <-
  tibble::tribble(
      ~row,  ~PL, ~year,
     1, "C", 2009L,
     2, "C", 2010L,
     3, "Java", 2010L,
     4, "Ruby", 2010L,
     5, "Ruby", 2011L,
     6, "Ruby", 2011L,
     7, "C", 2010L
  )
# df$year_date <- as.Date(paste0(df$year, "-01-01"))
# I wasn't sure why you needed the year as a date, but this works.
  
library(tidyverse)

# This counts the occurrences of each PL-year combo and fills in 
# zeros for the combos that didn't appear.
df_counts <-
  df %>% 
  group_by(PL, year) %>% 
  tally() %>% ungroup() %>% 
  complete(PL,year) %>% 
  replace_na(list(n=0))

# This uses the transformed data, with year on x, n (count) on y, 
# and each PL getting a different color.
ggplot(df_counts, aes(year, n, color=PL)) + 
  geom_line() +
  scale_x_continuous(breaks = 2009:2011, minor_breaks = NULL)


#5

yes exactly that what I want, your code is run correctly on your data
but when I try on my data It throw this error

Error in UseMethod("group_by_") : 
  no applicable method for 'group_by_' applied to an object of class "character"
>

my df is result of sql query
year-> smallint
PL -> varchar


#6

It sounds like you need to convert the varchar PL column from your query into the kind of character type that ggplot (and the tidyverse, and R in general, I presume) can use.

It might suffice to run:
df$PL <- as.character(df$PL)


#7

thanks jonspring
this run without error
df["PL"] <- as.character(df["PL"])
but still this line throw the same error
df_counts <- df %>% group_by(PL, year) %>% tally() %>% ungroup() %>% complete(PL,year) %>% replace_na(list(n=0))

Error in UseMethod("group_by_") : 
  no applicable method for 'group_by_' applied to an object of class "character"```

#8

This is an instance when you want strings to work as factors, so you need to convert your PL column into the type factor.

You may want to look at the forcats package:
http://forcats.tidyverse.org/


#9

thanks jsonspring very much
It executes correctly now without any error, and error because I executed the SQL query file without saving it