Error Information:
I get the following error:
Error in dd %>% select(VisitorID) %>% distinct() %>% mutate(seed = runif(nrow(.), :
could not find function "%>%"
So then I try to install the packages to ensure I have the function but I only get more errors:
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.5/sys_2.1.tgz'
Content type 'application/x-gzip' length 68633 bytes (67 KB)
==================================================
downloaded 67 KB
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.5/askpass_1.1.tgz'
Content type 'application/x-gzip' length 21399 bytes (20 KB)
==================================================
downloaded 20 KB
System Information:
- RStudio Edition: Desktop
- OS Version: Version 1.1.463
Thanks for your help,
Daphne
The complete code I am trying to run is as follows:
#Welcome back to R! As usual, we'll start by loading in the packages that we need for this course by using the
# library function. Remember never to get forget this step, it's very important!
#----------------------------------------------
set.seed(100)
n<- 50000 #First we decide how many visits do we want in our simulation
dd<- data.frame(
VisitorID = round(runif(n = n, min = 1,max = n/10),0)
) #Next, we create a column called VisitorID made up of random numbers. These numbers are unique visitor identifiers!
#Some of these random numbers are repated, in the same way we will have repeat and new visitors on our website
head(dd) #Now that we have our visitor id's, we can start to make our simulaton robust with more data!
un<- dd %>%
select(VisitorID) %>% #Select that we want to work with our Visitor ID's
distinct() %>% #The distinct command specifies that we only want to isolate the distinct visitors!
mutate(seed = runif(nrow(.), 0, 2), #Then we create a dummy variable, that we can use...
seed2 = runif(nrow(.), 0,2)) %>%
mutate(Variant = case_when( #In order to assign the same value to each visitor through a join command!
seed > 1 ~ "A",
seed <= 1 ~ "B"
),
Network =
case_when(
seed2 > 1 ~ "3G",
seed2 <= 1 ~ "4G")) %>%
select(VisitorID, Variant, Network)
dd<- left_join(dd, un, by = c("VisitorID")) #Finally we perform the join to assign all non-distinct visitor ID's with the
#same Variant and Network values, so that they are consistent!
head(dd)
#----------------------------------------------
#Before we go on, it will be good to mention R's ability to create normal distributions. A normal distribution is a set
# of values randomly pulled from a distribution with a mean of x and a standard deviation of y, or to put another way-
# we're pulling values randomly from a "true" population! A binomial distribution is just like a normal distribution,
# except only with values of 1's and 0's.
#----------------------------------------------
rnorm(10, mean = 10, sd = 2)
rbinom(10,1,.5)
dd<- dd %>%
mutate(PageViews = sqrt(round(rnorm(n, mean = 2, 3))^2)+1,
PageDepth = rnorm(nrow(.), mean = 4, sd = 2),
NewVisitor = rbinom(nrow(.), 1, .6),
SearchMade = rbinom(nrow(.), 1, .15),
Conversion = rbinom(n, 1, .05),
VideoPlays = ifelse(
Variant == "B" & Network == "4G", round(rnorm(length(Conversion[which(Variant == "B" & Network == "4G")]),5, 1),0),
ifelse(Variant == "A" & Network == "4G", round(rnorm(length(Conversion[which(Variant == "A" & Network == "4G")]),3, 1),0),
round(rnorm(length(Conversion[which(Network == "3G")]),0, 1),0)
)),
Revenue = ifelse(Conversion == 1 & Variant == "B" & Network == "3G",rnorm(length(Conversion[which(Conversion == 1 & Variant == "B" & Network == "3G")]),16.5, 3),
ifelse(Conversion == 1 & Variant == "A" & Network == "3G",rnorm(length(Conversion[which(Conversion == 1 & Variant == "A" & Network == "3G")]),18, 3),
ifelse(Conversion ==1 & Network == "4G",rnorm(length(Conversion[which(Conversion == 1)]),19,3),0))))
head(dd)
#----------------------------------------------
#You can see we've created some interesting metrics to track, but we can actually do deeper than this! Even if
# our analysis tool outputs simple metrics like those below, we can create more interesting combinations of metrics
# to run experiments on
# For example, let's say we wanted to create a metric to represent the cumulative Engagement level of a visitor with
# our website. We can do that by creating a new function based on the metric we have a available.
#----------------------------------------------
dd %>%
mutate(Engagement.Level = SearchMade*3 + (PageViews/5) + (VideoPlays/5) * PageDepth) %>% #Our creation is random, but you
head() #can create whatever values you want! I'd strongly suggest basing them on solid business logic
#----------------------------------------------
#Quick tip! Conversion Rate metrics are not what they seem! Orders/Unique Visitors is a bad metric to use for conversion.
# Most testing tools assume the details is binomial when in reality Orders is a "continuous" metric! Create a metric like
# the one below to safeguard yourself against troubles.
#----------------------------------------------
dd %>%
group_by(VisitorID, Variant, Network) %>%
summarise(RPV = sum(Revenue), AvgPV = mean(PageViews), Orders = sum(Conversion)) %>%
mutate(Converted = case_when(
Orders > 0 ~ 1,
Orders == 0 ~ 0
)) %>%
head()