Hi Andresrcs,
Thanks for your help. I tried to install dplyr and got the following:
The downloaded binary packages are in
/var/folders/db/cyszd47j6gj2488n6qt7550r0000gq/T//RtmpJ8OQpY/downloaded_packages
> install.packages("dplyr")
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.5/dplyr_0.7.8.tgz'
Content type 'application/x-gzip' length 5720340 bytes (5.5 MB)
==================================================
downloaded 5.5 MB
The downloaded binary packages are in
/var/folders/db/cyszd47j6gj2488n6qt7550r0000gq/T//RtmpJ8OQpY/downloaded_packages
> library(dplyr)
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
I ran the code again and it seems to have solved the %>% issue but still shows multiple other errors:
> set.seed(100)
>
> n<- 50000 #First we decide how many visits do we want in our simulation
> dd<- data.frame(
+ VisitorID = round(runif(n = n, min = 1,max = n/10),0)
+ ) #Next, we create a column called VisitorID made up of random numbers. These numbers are unique visitor identifiers!
> #Some of these random numbers are repated, in the same way we will have repeat and new visitors on our website
>
>
> head(dd) #Now that we have our visitor id's, we can start to make our simulaton robust with more data!
VisitorID
1 1540
2 1289
3 2762
4 283
5 2343
6 2419
>
>
> un<- dd
> select(VisitorID) %>% #Select that we want to work with our Visitor ID's
+ distinct() %>% #The distinct command specifies that we only want to isolate the distinct visitors!
+ mutate(seed = runif(nrow(.), 0, 2), #Then we create a dummy variable, that we can use...
+ seed2 = runif(nrow(.), 0,2)) %>%
+ mutate(Variant = case_when( #In order to assign the same value to each visitor through a join command!
+ seed > 1 ~ "A",
+ seed <= 1 ~ "B"
+ ),
+ Network =
+ case_when(
+ seed2 > 1 ~ "3G",
+ seed2 <= 1 ~ "4G")) %>%
+ select(VisitorID, Variant, Network)
Error in select(VisitorID) : object 'VisitorID' not found
>
> head(un)
VisitorID
1 1540
2 1289
3 2762
4 283
5 2343
6 2419
>
> dd<- left_join(dd, un, by = c("VisitorID")) #Finally we perform the join to assign all non-distinct visitor ID's with the
> #same Variant and Network values, so that they are consistent!
>
> head(dd)
VisitorID
1 1540
2 1540
3 1540
4 1540
5 1540
6 1540
>
>
> #----------------------------------------------
> #Before we go on, it will be good to mention R's ability to create normal distributions. A normal distribution is a set
> # of values randomly pulled from a distribution with a mean of x and a standard deviation of y, or to put another way-
> # we're pulling values randomly from a "true" population! A binomial distribution is just like a normal distribution,
> # except only with values of 1's and 0's.
> #----------------------------------------------
>
>
> rnorm(10, mean = 10, sd = 2)
[1] 10.471506 11.484392 9.321547 12.322975 12.908264 8.240405 10.330313 11.728771 13.774858 11.296582
> rbinom(10,1,.5)
[1] 0 1 0 1 1 1 0 0 1 0
>
>
> dd<- dd %>%
+ mutate(PageViews = sqrt(round(rnorm(n, mean = 2, 3))^2)+1,
+ PageDepth = rnorm(nrow(.), mean = 4, sd = 2),
+ NewVisitor = rbinom(nrow(.), 1, .6),
+ SearchMade = rbinom(nrow(.), 1, .15),
+ Conversion = rbinom(n, 1, .05),
+ VideoPlays = ifelse(
+ Variant == "B" & Network == "4G", round(rnorm(length(Conversion[which(Variant == "B" & Network == "4G")]),5, 1),0),
+ ifelse(Variant == "A" & Network == "4G", round(rnorm(length(Conversion[which(Variant == "A" & Network == "4G")]),3, 1),0),
+ round(rnorm(length(Conversion[which(Network == "3G")]),0, 1),0)
+ )),
+ Revenue = ifelse(Conversion == 1 & Variant == "B" & Network == "3G",rnorm(length(Conversion[which(Conversion == 1 & Variant == "B" & Network == "3G")]),16.5, 3),
+ ifelse(Conversion == 1 & Variant == "A" & Network == "3G",rnorm(length(Conversion[which(Conversion == 1 & Variant == "A" & Network == "3G")]),18, 3),
+ ifelse(Conversion ==1 & Network == "4G",rnorm(length(Conversion[which(Conversion == 1)]),19,3),0))))
Error in mutate_impl(.data, dots) :
Column `PageViews` must be length 549694 (the number of rows) or one, not 50000
>
>
> head(dd)
VisitorID
1 1540
2 1540
3 1540
4 1540
5 1540
6 1540
>
>
> #----------------------------------------------
> #You can see we've created some interesting metrics to track, but we can actually do deeper than this! Even if
> # our analysis tool outputs simple metrics like those below, we can create more interesting combinations of metrics
> # to run experiments on
>
> # For example, let's say we wanted to create a metric to represent the cumulative Engagement level of a visitor with
> # our website. We can do that by creating a new function based on the metric we have a available.
> #----------------------------------------------
>
>
> dd %>%
+ mutate(Engagement.Level = SearchMade*3 + (PageViews/5) + (VideoPlays/5) * PageDepth) %>% #Our creation is random, but you
+ head() #can create whatever values you want! I'd strongly suggest basing them on solid business logic
Error in mutate_impl(.data, dots) :
Evaluation error: object 'SearchMade' not found.
>
>
> #----------------------------------------------
> #Quick tip! Conversion Rate metrics are not what they seem! Orders/Unique Visitors is a bad metric to use for conversion.
> # Most testing tools assume the details is binomial when in reality Orders is a "continuous" metric! Create a metric like
> # the one below to safeguard yourself against troubles.
> #----------------------------------------------
>
> dd %>%
+ group_by(VisitorID, Variant, Network) %>%
+ summarise(RPV = sum(Revenue), AvgPV = mean(PageViews), Orders = sum(Conversion)) %>%
+ mutate(Converted = case_when(
+ Orders > 0 ~ 1,
+ Orders == 0 ~ 0
+ )) %>%
+ head()
Error in grouped_df_impl(data, unname(vars), drop) :
Column `Variant` is unknown
Could this be caused by another library being missing?