Adding data from predict() to plot that contains data from CSV

Beau_Waldrop · April 7, 2020, 5:37pm

Hello,

I'm wanting to plot data that I came up with using predict() with data that was included in a CSV.
Right now, I have them in separate plots. Lastly, abline() isn't adding a regression line to the plot. Is there a better way to go about doing this? Thank you.

library(ggplot2)
library(tidyr)
library(tidyverse)

pops <- read_csv("nst-est2019-popchg2010_2019.csv")
OK_pops<- filter(pops, NAME == "Oklahoma")
  pop_OK <- pivot_longer(OK_pops,
  			cols=starts_with("POP"),
  			names_to="Year",
  			names_prefix = "POPESTIMATE",
  			values_to = "Population"
  )

options(digits=4)
pop_OK <- transform(pop_OK, Population=as.numeric(Population))
 pop_OK <- transform(pop_OK, Year=as.numeric(Year))

str(pop_OK)

ggplot(pop_OK) + geom_point(aes(x=Year, y=Population))
abline(pop_OK)


model <-lm(formula = Population ~ Year, data = pop_OK)
summary(model)
pred <- predict(model, newdata=data.frame(Year=2020:2024))
setNames(pred, 2020:2024)

plot(pred, pch = 16, col = "blue" )
scale_x_discrete(breaks=c("1", "2", "3", "4", "5"),
                      labels=c("2020","2021","2022","2023","2024"))

technocrat · April 7, 2020, 9:58pm

Please see the FAQ: What's a reproducible example (`reprex`) and how do I do one? Using a reprex, complete with representative data will attract quicker and more answers.

Here, without the data needed for

pops <- read_csv("nst-est2019-popchg2010_2019.csv")

there's a high hurdle to attract good answer--the problem needs to be reverse engineered from other data. That's sometimes possible but a deterrent for the large majority.

Beau_Waldrop · April 9, 2020, 6:37pm

Here;s the output from the above code when I try to rbind it with

nal_plot <- rbind(pop_OK, pred)

You'll notice that rbin just created a column and added the population predictions to the row over and over until the end (you can't see this in the output, only because I brought over the final two columns, the ones that I'm plotting, for brevity purposes).

tibble::tribble(
             ~Year,      ~Population,
              2010,          3759944,
              2011,          3788379,
              2012,          3818814,
              2013,          3853214,
              2014,          3878187,
              2015,          3909500,
              2016,          3926331,
              2017,          3931316,
              2018,          3940235,
              2019,          3956971,
  4042171.32727273, 4064288.95757575
  )

How would I adjust rbind so that it adds the populations predictions to the population column and the years to the Years column? Thanks!

Beau_Waldrop · April 9, 2020, 6:40pm

Thank you. That thread taught this newbie about datapasta, something I'll be using a lot of!

technocrat · April 9, 2020, 7:24pm

I'm going to be lazy and assume that the base data_frame and the prediction are identical, but should work the same.

base <- tibble::tribble(
    ~Year,      ~Population,
    2010,          3759944,
    2011,          3788379,
    2012,          3818814,
    2013,          3853214,
    2014,          3878187,
    2015,          3909500,
    2016,          3926331,
    2017,          3931316,
    2018,          3940235,
    2019,          3956971 #,
    #4042171.32727273, 4064288.95757575 unsure about these
)
predicted <- tibble::tribble(
    ~Pred_year,      ~Pred_Population,
    2010,          3759944,
    2011,          3788379,
    2012,          3818814,
    2013,          3853214,
    2014,          3878187,
    2015,          3909500,
    2016,          3926331,
    2017,          3931316,
    2018,          3940235,
    2019,          3956971 #,
    #4042171.32727273, 4064288.95757575
)

cbind(base,predicted)
#>    Year Population Pred_year Pred_Population
#> 1  2010    3759944      2010         3759944
#> 2  2011    3788379      2011         3788379
#> 3  2012    3818814      2012         3818814
#> 4  2013    3853214      2013         3853214
#> 5  2014    3878187      2014         3878187
#> 6  2015    3909500      2015         3909500
#> 7  2016    3926331      2016         3926331
#> 8  2017    3931316      2017         3931316
#> 9  2018    3940235      2018         3940235
#> 10 2019    3956971      2019         3956971

^{Created on 2020-04-09 by the reprex package (v0.3.0)}

Everyone* starts off as a newbie in R. Those who seem to have trouble getting traction, paradoxically, are already fluent in a different style of programming language.

One of the hard things to get used to in R is the concept that everything is an object that has properties. Some objects have properties that allow them to operate on other objects to produce new objects. Those are functions.

Think of R as school algebra writ large: f(x) = y, where the objects are f, a function, x, an object (and there may be several) termed the argument and y is an object termed a value, which can be as simple as a single number (aka an atomic vector) or a very packed object with a multitude of data and labels.

And, because functions are also objects, they can be arguments to other functions, like the old g(f(x)) = y. (Trivia, this is called being a first class object.)

Although there are function objects in R that operate like control statements in imperative/procedural language, they are best used "under the hood." As it presents to users interactively, R is a functional programming language. Instead of saying

take this, take that, do this, then do that, then if the result is this one thing, do this other thing, but if not do something else and give me the answer

in the style of most common programming languages, R allows the user to say

use this function to take this argument and turn it into the value I want for a result

system · April 30, 2020, 7:24pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.