My code doesnt work if i save my work then try opening it again ? but when i programmed it the first time it worked great!!

i wrote this code and it worked got the required output but i saved it to go get a coffee and tried opening it again afterwards and it would display error on the last 2 lines every time ... can some one help ? im uploading the code and the error picture below.


# Netflix Data Kaggle

# Importing Libraries

library(tidyverse)
library(lubridate)
library(naniar)
library(skimr)

# loading the dataset

netflix <- read_csv("c://users/kumarappan M/documents/netflix_titles.csv")

tibble(netflix)


# Summary of the dataset 

skim_without_charts(netflix)

# Missing values

vis_miss(netflix)

gg_miss_fct( x=netflix , fct = type ) + labs ( title = " Missing Variables by type" , x = "Type")

## As you can see there are way too many missing values in the fields director , cast and country . i will create 2 data frames one to use later for analysis on directors and the other to drop the column director and cast as they contain too many missing values and its impossible to proxy the data. i will also replace the missing values in the country,date_added and rating column's by substituting  the missing values with their respective modes.



# Create the function mode as R doesnt have a pre set function to calculate mode then use the `ifelse()` to change the NA's to mode.

getmode <- function(v) {
   uniqv <- unique(v)
   uniqv[which.max(tabulate(match(v, uniqv)))]
}

netflix$country <- ifelse( is.na(netflix$country),
                        getmode(netflix$country),
                        netflix$country)
                        
netflix$date_added <- ifelse( is.na(netflix$date_added) , getmode(netflix$date_added) , netflix$date_added )

netflix$rating <- ifelse( is.na(netflix$rating) , getmode(netflix$rating) , netflix$rating)
                        
netflix <- netflix %>%  select( -c(description,cast,director,listed_in))



skim_without_charts(netflix)

# All Na's have been cleaned or replaced in the data set ^^^


#### Data cleaning and manipulation for analysis


# Some countries have multiple values and needs to be separated to just the first country, the main country of production. 

netflix <- separate( netflix , country , into = c( "production_country") , sep = ",")

skim_without_charts(netflix)


# create a month and year added column

netflix <- netflix %>% separate ( date_added , into = c("month_added" , "year_added" ) , sep=", ")
netflix <- netflix %>% separate ( month_added , into = c("month_added" ) , sep= " ")


# The rating column can be segregated into ages , to anlyse the values by age category. i used this link <https://www.spectrum.net/support/tv/tv-and-movie-ratings-descriptions/> to get info on rating categories.

netflix$rating <- gsub ( "TV-MA" , "Adult" , netflix$rating)
netflix$rating <- gsub ( "TV-PG" , "Adult" , netflix$rating)
netflix$rating <- gsub ( "TV-14" , "Adult" , netflix$rating)
netflix$rating <- gsub ( "UR" , "Adult" , netflix$rating)
netflix$rating <- gsub ( "R" , "Adult" , netflix$rating)
netflix$rating <- gsub ( "NR" , "Adult" , netflix$rating)
netflix$rating <- gsub ( "NC-17" , "Adult" , netflix$rating)
netflix$rating <- gsub ( "TV-Y" , "children" , netflix$rating)
netflix$rating <- gsub ( "TV-Y7" , "Older kids" , netflix$rating)
netflix$rating <- gsub ( "TV-Y7-FV" , "Older kids" , netflix$rating)
netflix$rating <- gsub ( "PG" , "Older kids" , netflix$rating)
netflix$rating <- gsub ( "PG-13" , "Teens" , netflix$rating)
netflix$rating <- gsub ( "G" , "General audience" , netflix$rating)
netflix$rating <- gsub ( "TV-G" , "General audience" , netflix$rating)
netflix$rating <- gsub ( "Older kids-13" , "Older kids" , netflix$rating)
netflix$rating <- gsub ( "General audienceeneral audience" , "General audience" , netflix$rating)
netflix$rating <- gsub ( "NAdult" , "Adult" , netflix$rating)
netflix$rating <- gsub ( "children7" , "children" , netflix$rating)

# split the data into 2 sets one with just movies and one with just tv shows 

netflix_shows <- netflix %>% filter(type == "TV Show")
netflix_movies <- netflix %>% filter(type == "Movie")

# Visualisations

# 1.Distribution by content

pie_1 <- netflix %>% group_by(type) %>% summarise(total=n()) %>% mutate(perc_pie=round( 100*pie_1$total/sum(pie_1$total))) 

pie(pie_1$total , paste0(pie_1$perc_pie,"%") , main = "Segregation by type" , col = rainbow(length(pie_1)))
legend("topright", c("Movie" , "TV show"), cex = 0.8,
   fill = rainbow(length(pie_1)))

Error:

In this part, you'll need to change how you reference the total column:

pie_1 <- netflix %>% 
  group_by(type) %>% 
  summarise(total=n()) %>% 
  mutate(perc_pie=round( 100*pie_1$total/sum(pie_1$total)))  # problem line

This is because pie_1 doesn't exist as an object until R gets all the way through your %>% pipe chain. However, you can refer to your summarised total column using just total instead of pie_1$total like so:

pie_1 <- netflix %>% 
  group_by(type) %>% 
  summarise(total=n()) %>% 
  mutate(perc_pie=round( 100*total/sum(total)))  # problem line

If you ran the code the first time with two separate calls, first creating pie_1 with summarise, then adding the mutate in a second step, then this would work because R would be able to reference pie_1$total:

pie_1 <- netflix %>% 
  group_by(type) %>% 
  summarise(total=n()) 

pie_1 <- pie_1 %>% 
  mutate(perc_pie=round( 100*pie_1$total/sum(pie_1$total)))  # problem line

Though again, you could drop pie_1 from your mutate call:

pie_1 <- netflix %>% 
  group_by(type) %>% 
  summarise(total=n()) 

pie_1 <- pie_1 %>% 
  mutate(perc_pie=round( 100*total/sum(total)))  # problem line

The error in this line leads to the error in the pie() function call - because R didn't complete the pipe chain and create pie_1, you get the error:

object 'pie_1' not found
1 Like

That makes a lot a sense , i thought thats what was happening after 3 hours :rofl: and tried what you told me and it worked. i just got on here and it was nice to hear the explanation from you. i really appreciate the help !! thanks a ton.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.