Multiple errors running code in R and installing packages

Error Information:

I get the following error:

Error in dd %>% select(VisitorID) %>% distinct() %>% mutate(seed = runif(nrow(.),  : 
  could not find function "%>%" 

So then I try to install the packages to ensure I have the function but I only get more errors:

trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.5/sys_2.1.tgz'
Content type 'application/x-gzip' length 68633 bytes (67 KB)
==================================================
downloaded 67 KB

trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.5/askpass_1.1.tgz'
Content type 'application/x-gzip' length 21399 bytes (20 KB)
==================================================
downloaded 20 KB

System Information:

  • RStudio Edition: Desktop
  • OS Version: Version 1.1.463

Thanks for your help,
Daphne


The complete code I am trying to run is as follows:

#Welcome back to R! As usual, we'll start by loading in the packages that we need for this course by using the

# library function. Remember never to get forget this step, it's very important!
#----------------------------------------------

set.seed(100)

n<- 50000 #First we decide how many visits do we want in our simulation
dd<- data.frame(
  VisitorID = round(runif(n = n, min = 1,max = n/10),0)
) #Next, we create a column called VisitorID made up of random numbers. These numbers are unique visitor identifiers!
#Some of these random numbers are repated, in the same way we will have repeat and new visitors on our website


head(dd) #Now that we have our visitor id's, we can start to make our simulaton robust with more data!


un<- dd %>%
  select(VisitorID) %>% #Select that we want to work with our Visitor ID's
  distinct() %>% #The distinct command specifies that we only want to isolate the distinct visitors!
  mutate(seed = runif(nrow(.), 0, 2), #Then we create a dummy variable, that we can use...
         seed2 = runif(nrow(.), 0,2)) %>%
  mutate(Variant = case_when( #In order to assign the same value to each visitor through a join command!
    seed > 1 ~ "A",
    seed <= 1 ~ "B"
  ),
  Network = 
    case_when(
      seed2 > 1 ~ "3G",
      seed2 <= 1 ~ "4G")) %>%
  select(VisitorID, Variant, Network)

dd<- left_join(dd, un, by = c("VisitorID")) #Finally we perform the join to assign all non-distinct visitor ID's with the
#same Variant and Network values, so that they are consistent!

head(dd)


#----------------------------------------------
#Before we go on, it will be good to mention R's ability to create normal distributions. A normal distribution is a set
# of values randomly pulled from a distribution with a mean of x and a standard deviation of y, or to put another way-
# we're pulling values randomly from a "true" population! A binomial distribution is just like a normal distribution,
# except only with values of 1's and 0's.
#----------------------------------------------


rnorm(10, mean = 10, sd = 2)
rbinom(10,1,.5)


dd<- dd %>%
  mutate(PageViews = sqrt(round(rnorm(n, mean = 2, 3))^2)+1,
         PageDepth = rnorm(nrow(.), mean = 4, sd = 2),
         NewVisitor = rbinom(nrow(.), 1, .6),
         SearchMade = rbinom(nrow(.), 1, .15),
         Conversion = rbinom(n, 1, .05),
         VideoPlays = ifelse(
           Variant == "B" & Network == "4G", round(rnorm(length(Conversion[which(Variant == "B" & Network == "4G")]),5, 1),0),
           ifelse(Variant == "A" & Network == "4G", round(rnorm(length(Conversion[which(Variant == "A" & Network == "4G")]),3, 1),0),
                  round(rnorm(length(Conversion[which(Network == "3G")]),0, 1),0)
           )),
          Revenue = ifelse(Conversion == 1 & Variant == "B" & Network == "3G",rnorm(length(Conversion[which(Conversion == 1 & Variant == "B" & Network == "3G")]),16.5, 3),
                          ifelse(Conversion == 1 & Variant == "A" & Network == "3G",rnorm(length(Conversion[which(Conversion == 1 & Variant == "A" & Network == "3G")]),18, 3),
                                 ifelse(Conversion ==1 & Network == "4G",rnorm(length(Conversion[which(Conversion == 1)]),19,3),0)))) 


head(dd) 


#----------------------------------------------
#You can see we've created some interesting metrics to track, but we can actually do deeper than this! Even if
# our analysis tool outputs simple metrics like those below, we can create more interesting combinations of metrics
# to run experiments on

# For example, let's say we wanted to create a metric to represent the cumulative Engagement level of a visitor with 
# our website. We can do that by creating a new function based on the metric we have a available. 
#----------------------------------------------


dd %>%
  mutate(Engagement.Level = SearchMade*3 + (PageViews/5) + (VideoPlays/5) * PageDepth) %>% #Our creation is random, but you
  head() #can create whatever values you want! I'd strongly suggest basing them on solid business logic


#----------------------------------------------
#Quick tip! Conversion Rate metrics are not what they seem! Orders/Unique Visitors is a bad metric to use for conversion. 
# Most testing tools assume the details is binomial when in reality Orders is a "continuous" metric! Create a metric like
# the one below to safeguard yourself against troubles.
#----------------------------------------------

dd %>%
  group_by(VisitorID, Variant, Network) %>%
  summarise(RPV = sum(Revenue), AvgPV = mean(PageViews), Orders = sum(Conversion)) %>%
  mutate(Converted = case_when(
    Orders > 0 ~ 1,
    Orders == 0 ~ 0
  )) %>%
head()

As far as I can see, you are not loading the libraries you are trying to use before using them, you have to put this at the beginning of your code.

library(dplyr)

if you don't have dplyr installed then do

install.packages("dplyr")

Also, you are not showing any installation error, just download messages.

1 Like

Hi Andresrcs,

Thanks for your help. I tried to install dplyr and got the following:

The downloaded binary packages are in
	/var/folders/db/cyszd47j6gj2488n6qt7550r0000gq/T//RtmpJ8OQpY/downloaded_packages
> install.packages("dplyr")
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.5/dplyr_0.7.8.tgz'
Content type 'application/x-gzip' length 5720340 bytes (5.5 MB)
==================================================
downloaded 5.5 MB


The downloaded binary packages are in
	/var/folders/db/cyszd47j6gj2488n6qt7550r0000gq/T//RtmpJ8OQpY/downloaded_packages
> library(dplyr)

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

I ran the code again and it seems to have solved the %>% issue but still shows multiple other errors:

> set.seed(100)
> 
> n<- 50000 #First we decide how many visits do we want in our simulation
> dd<- data.frame(
+   VisitorID = round(runif(n = n, min = 1,max = n/10),0)
+ ) #Next, we create a column called VisitorID made up of random numbers. These numbers are unique visitor identifiers!
> #Some of these random numbers are repated, in the same way we will have repeat and new visitors on our website
> 
> 
> head(dd) #Now that we have our visitor id's, we can start to make our simulaton robust with more data!
  VisitorID
1      1540
2      1289
3      2762
4       283
5      2343
6      2419
> 
> 
> un<- dd
>   select(VisitorID) %>% #Select that we want to work with our Visitor ID's
+   distinct() %>% #The distinct command specifies that we only want to isolate the distinct visitors!
+   mutate(seed = runif(nrow(.), 0, 2), #Then we create a dummy variable, that we can use...
+          seed2 = runif(nrow(.), 0,2)) %>%
+   mutate(Variant = case_when( #In order to assign the same value to each visitor through a join command!
+     seed > 1 ~ "A",
+     seed <= 1 ~ "B"
+   ),
+   Network = 
+     case_when(
+       seed2 > 1 ~ "3G",
+       seed2 <= 1 ~ "4G")) %>%
+   select(VisitorID, Variant, Network)
Error in select(VisitorID) : object 'VisitorID' not found
> 
> head(un) 
  VisitorID
1      1540
2      1289
3      2762
4       283
5      2343
6      2419
> 
> dd<- left_join(dd, un, by = c("VisitorID")) #Finally we perform the join to assign all non-distinct visitor ID's with the
> #same Variant and Network values, so that they are consistent!
> 
> head(dd)
  VisitorID
1      1540
2      1540
3      1540
4      1540
5      1540
6      1540
> 
> 
> #----------------------------------------------
> #Before we go on, it will be good to mention R's ability to create normal distributions. A normal distribution is a set
> # of values randomly pulled from a distribution with a mean of x and a standard deviation of y, or to put another way-
> # we're pulling values randomly from a "true" population! A binomial distribution is just like a normal distribution,
> # except only with values of 1's and 0's.
> #----------------------------------------------
> 
> 
> rnorm(10, mean = 10, sd = 2)
 [1] 10.471506 11.484392  9.321547 12.322975 12.908264  8.240405 10.330313 11.728771 13.774858 11.296582
> rbinom(10,1,.5)
 [1] 0 1 0 1 1 1 0 0 1 0
> 
> 
> dd<- dd %>%
+   mutate(PageViews = sqrt(round(rnorm(n, mean = 2, 3))^2)+1,
+          PageDepth = rnorm(nrow(.), mean = 4, sd = 2),
+          NewVisitor = rbinom(nrow(.), 1, .6),
+          SearchMade = rbinom(nrow(.), 1, .15),
+          Conversion = rbinom(n, 1, .05),
+          VideoPlays = ifelse(
+            Variant == "B" & Network == "4G", round(rnorm(length(Conversion[which(Variant == "B" & Network == "4G")]),5, 1),0),
+            ifelse(Variant == "A" & Network == "4G", round(rnorm(length(Conversion[which(Variant == "A" & Network == "4G")]),3, 1),0),
+                   round(rnorm(length(Conversion[which(Network == "3G")]),0, 1),0)
+            )),
+           Revenue = ifelse(Conversion == 1 & Variant == "B" & Network == "3G",rnorm(length(Conversion[which(Conversion == 1 & Variant == "B" & Network == "3G")]),16.5, 3),
+                           ifelse(Conversion == 1 & Variant == "A" & Network == "3G",rnorm(length(Conversion[which(Conversion == 1 & Variant == "A" & Network == "3G")]),18, 3),
+                                  ifelse(Conversion ==1 & Network == "4G",rnorm(length(Conversion[which(Conversion == 1)]),19,3),0)))) 
Error in mutate_impl(.data, dots) : 
  Column `PageViews` must be length 549694 (the number of rows) or one, not 50000
> 
> 
> head(dd) 
  VisitorID
1      1540
2      1540
3      1540
4      1540
5      1540
6      1540
> 
> 
> #----------------------------------------------
> #You can see we've created some interesting metrics to track, but we can actually do deeper than this! Even if
> # our analysis tool outputs simple metrics like those below, we can create more interesting combinations of metrics
> # to run experiments on
> 
> # For example, let's say we wanted to create a metric to represent the cumulative Engagement level of a visitor with 
> # our website. We can do that by creating a new function based on the metric we have a available. 
> #----------------------------------------------
> 
> 
> dd %>%
+   mutate(Engagement.Level = SearchMade*3 + (PageViews/5) + (VideoPlays/5) * PageDepth) %>% #Our creation is random, but you
+   head() #can create whatever values you want! I'd strongly suggest basing them on solid business logic
Error in mutate_impl(.data, dots) : 
  Evaluation error: object 'SearchMade' not found.
> 
> 
> #----------------------------------------------
> #Quick tip! Conversion Rate metrics are not what they seem! Orders/Unique Visitors is a bad metric to use for conversion. 
> # Most testing tools assume the details is binomial when in reality Orders is a "continuous" metric! Create a metric like
> # the one below to safeguard yourself against troubles.
> #----------------------------------------------
> 
> dd %>%
+   group_by(VisitorID, Variant, Network) %>%
+   summarise(RPV = sum(Revenue), AvgPV = mean(PageViews), Orders = sum(Conversion)) %>%
+   mutate(Converted = case_when(
+     Orders > 0 ~ 1,
+     Orders == 0 ~ 0
+   )) %>%
+ head()
Error in grouped_df_impl(data, unname(vars), drop) : 
  Column `Variant` is unknown

Could this be caused by another library being missing?

You are missing one pipe operator (%>%) at the beginning of your first code chunk

un<- dd %>%
   select(VisitorID) %>% 
   distinct() %>% 
   mutate(seed = runif(nrow(.), 0, 2), 
          seed2 = runif(nrow(.), 0,2)) %>%
   mutate(Variant = case_when( 
     seed > 1 ~ "A",
     seed <= 1 ~ "B"
   ),
   Network = 
     case_when(
       seed2 > 1 ~ "3G",
       seed2 <= 1 ~ "4G")) %>%
   select(VisitorID, Variant, Network)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.