Error Preventing Knitting


#1

Thanks in advance for any help. I am a Rstudio newbie trying to knit a .html to submit a final assignment for a Coursera class that uses RStudio

I am receiving an error message that prevents knitting to .html. The message is as follows: "Line 268 Error in unique(natenvir) : object 'natenvir' not found Calls: ... eval -> barplot -> barplot.default -> levels -> unique"

The code chunk that apears to trigger the problem is below:

Steps for generating a barplot:

sets the margins around the barplot

par(mar = c(11, 11, 5, 2) + 0.1, las = 2)

draws barplot using the table (s3) with a legend in the bottom right

barplot(s3, legend = levels(unique(natenvir)), args.legend = list(x = 'bottomright'))

y axis label

mtext(text = "Proportion of Respondents", side = 2, line = 4, las = 3)

x axis label

mtext(text = "Religious Affiliation", side = 1, line = 8, las = 1)


#2

I think you need to include some more information in order to fully diagnose this problem.

Can you include the rest of your code, especially the part where natenvir is created? (Keep in mind this site’s homework policy, and don’t include any of the verbatim text of your assignment, though)

In the meantime, as a guess: is natenvir a variable (column) in a data frame? If so, you need to refer to it with dataframename$variablename syntax. Also, if natenvir is a factor already, you don’t need to use unique() inside your call to levels() (levels on its own gives you a unique list of the factor levels). However, be aware that neither of those bits of advice may help — I need to see more of your code to know for sure!

Finally, a tip to help your helpers: it’s really hard to read code that isn’t formatted as code, especially since this site will misinterpret special characters as other kinds of formatting (that’s why all your comments came out as headings!). Here’s how you can make the code you post here look nice :sparkles:


#3

Hi jcblum,

Thanks very much for your response! natenvir is a factor in the original data frame, gss. When the barplot causing the problem is called, I am trying to refer to a table created with prop.table.

Thanks for referring me to the style guide as well. My assignment is complete and runs fine in R (I think) - the issue seems to be the knitting process in RStudio, I assume.

We are required to knit to .html in order to submit the final project for the course. There shouldn't be any issues with the homework policy since the assignment is complete and I am just looking for help with RStudio. I suspect the issue I am having is beyond the scope of what the instructor's expect from the students. The support forum for the Coursera course is not always super-responsive, hence my attempt to find help here which I greatly appreciate. The full code for the assingment is below. Thanks!

#load packages
library(ggplot2)
library(dplyr)
library(statsr)

#load data
load("gss.Rdata")

# creating a barplot for factor relig:

# sets the margins
par(mar = c(11, 11, 5, 2) + 0.1)

# plots the barchart and adjusts the y axis scale
barplot(table(gss$relig), ylim = c(0, 35000), ylab = " ", las=2)

# yaxis label and position
mtext(text = "Number of Respondents", side = 2, line = 4)

# x axis label and position
mtext(text = "Religious Affiliation", side = 1, line = 7)

# creating a barplot for factor natenvir:

# adjusts the margins to accommodate axis labels
par(mar = c(11, 7, 5, 2) + 0.1)

# produces the barplot and adjusts the y-axis scale
barplot(table(gss$natenvir),ylim = c(0, 20000), ylab = " ", las=2)

# y axis label
mtext(text = "Number of Respondents", side = 2, line = 4)

# x axis label
mtext(text = "Spending on the Environment", side = 1, line = 6)

#Determining the levels of factor relig:

levels(gss$relig)

#Determining the levels of factor natenvir:
levels(gss$natenvir)

#Determine the structure of the factors:
str(gss$relig)
str(gss$natenvir)

#Create a table for the relig variable:
relig_table <- table(gss$relig)
relig_table

#Create a table for the natenvir variable:
natenvir_table <- table(gss$natenvir)
natenvir_table

#Display proportions for the relig variable table:
prop.table(relig_table)

#Display proportions for the natenvir table:
prop.table(natenvir_table)

# cell counts for relig and natenvir:
cell_counts_table <-table(gss$relig, gss$natenvir)
cell_counts_table

Removing levels of the factor relig with <5 in one or more cells for natenvir
# removing the level "Hinduism" from relig
 y<-data.frame(subset(gss[gss$relig!="Hinduism",]))

# removing the level "Buddhism" from relig
y2<-data.frame(subset(y[y$relig!="Buddhism",]))

# removing the level "Other Eastern" from relig
y3<-data.frame(subset(y2[y2$relig!="Other Eastern",]))

# removing the level "Native American" from relig
y4<-data.frame(subset(y3[y3$relig!="Native American",]))

# removing the level "Inter-Nondenominational" from relig
 y5<-data.frame(subset(y4[y4$relig!="Inter-Nondenominational",]))

# removing the level "Orthodox-Christian" from relig
 y5<-data.frame(subset(y4[y4$relig!="Orthodox-Christian",]))

 # making sure relig is converted back into a factor
y5$relig <- factor(y5$relig)

# chi-square test for independence
chisq.test(y5$relig, y5$natenvir)

# Manipulate data to present in a table and then produce a barplot:

# make a table using just the relig and natenvir factors originally from gss (now housed in y5)
s = table(y5$relig, y5$natenvir)

# make a table of proportions for relig and natenvir factors
s2 = prop.table(s, 1)

# transpose the table of proportions so that natenvir is the dependent variable and relig is the independent variable in the barplot
s3 = t(prop.table(s,1))

# displays the transposed table of proportions showing responses to current spending on the environment (natenvir) within each level of religious affiliation (relig)
s3

# Steps for generating a barplot showing proportions of relig relative to natenvir (proportion of respondents in each religion that answered one of three ways):

# sets the margins around the barplot
par(mar = c(11, 11, 5, 2) + 0.1, las = 2)

# draws barplot using the table (s3) with a legend in the bottom right
barplot(s3, legend = levels(unique(natenvir)), args.legend = list(x = 'bottomright'))

# y axis label
mtext(text = "Proportion of Respondents", side = 2, line = 4, las = 3)

# x axis label
mtext(text = "Religious Affiliation", side = 1, line = 8, las = 1)

#count and print the number of respondents in each relig factor reporting "Too Much" for the natenvir factor

sf <- y5 %>% count(relig, natenvir)
print(sf, n = 33)

# conduct test of two different proportions
jp <- prop.test(x = c(38, 1852), n = c(633, 18778))
jp

#4

Thanks, that's helpful! When I run your code on my computer, the same line throws an error (no knitting involved). That makes sense because, in fact, natenvir is not defined as an object anywhere in your code. Here's what I suspect is going on:

Often when you've been working on something for a while, there will be lots of random R objects littering your workspace that you created (on purpose or accidentally) while you were experimenting with bits of code. When knitr starts rendering, it runs in a fresh session with its own workspace — it can only "see" objects that you create in the code you give it. So when you're getting an "object not found" error while knitting that you don't get while running your code outside of knitr, it's a clue that it's time to clear up the clutter because knitr just found a bug in your code :beetle: .

It's a good idea to clear your workspace (= "global environment") and start a new R session periodically while you're working to prevent this kind of bug from creeping in. In RStudio, you can use the little broom icon on the Environment pane to clear the workspace, and you can restart R from the Session menu. If you do this and try running your code again, I think you'll find that you get an error at the same spot without trying to knit anything.

OK, so how to fix your code? TL;DR: Since the names in s3 came from y5, I would just use:

barplot(s3, legend = levels(y5$natenvir), args.legend = list(x = 'bottomright'))
But I really want to get the names from s3!

To pull the names directly from s3, you'll need to use different syntax because s3 is not a data frame :open_mouth: As you can read in the documentation, table() returns an array, which is a different data structure that works differently. You can check this yourself by running:

is.data.frame(s3)
is.array(s3)

table() doesn't preserve any information about the names of the variables that were used to make the table, so the term natenvir means nothing to s3. The names for the table dimensions are stored in an attribute called dimnames (which is accessed with the function dimnames()). The dimnames are stored as a list of character vectors, one for each dimension of the table. You can see all this by examining the result of str(s3). natenvir corresponds to the first dimension of s3 (the rows), so you would pull out those row names with:

dimnames(s3)[[1]]

Good luck with the class!

P.S. In this case, I was able to find the teaching dataset you're using by googling around, but in general it's a good idea to provide at least a small sample of the data you're working with when seeking help, so that other people have all the parts they need to quickly test out your code. Lots more info on best practices for sharing code to get help here.


#5

Thanks for finding the bug, and knitting works for me know. I also appreciate the advice about periodically clearing my workspace. Indeed, the workspace gets very cluttered as I make different attempts to make my code work, and I am overall not very efficient using R. Thanks also for the link about best practices for sharing code, I will adhere to that in the future!