I've got a massive data-set filled with different columns of data. I have been trying to analyse certain aspects of this dataset using R (done most of it using Minitab and Excel but want to learn this way of doing it as well) and I was hoping to get some help here.
Here is a snippet of my data:
chimp
Who Activity Visitors Duration Visitor_Density X
1 Ch Stationary 0 14 High NA
2 Ch Stationary 20 18 High NA
3 Ch Interaction 0 2 High NA
4 Ch Stationary 30 6 High NA
5 Ch Interaction 30 1 High NA
6 Ch Display 30 10 High NA
7 Ch Interaction 0 6 High NA
8 Ch Stationary 0 5 High NA
9 Ch Stationary 20 20 High NA
10 Ch Stationary 30 13 High NA
I am trying to create a boxplot showing the differences in the means of the Duration spent on each Activity. So far my code looks like this:
chimp$Activity <- as.character(chimp$Activity)
chimp$Visitors <- as.numeric(chimp$Visitors)
chimp$Duration <- as.numeric(chimp$Duration)
boxplot(chimp, x= chimp$Activity, y = chimp$Duration, color = chimp$Visitor_Density)
However I keep getting an error "Error in x[floor(d)] + x[ceiling(d)] : non-numeric argument to binary operator"
I am extremely new to R and have been attempting this for a while, I'm assuming the problem is with how I am using the code for boxplotting or am I missing a package that the specific code works for? Additionally if anyone can help me grab the mean of the total duration for each activity without doing it individually by hand would be much appreciated:
(I got this by using the filter() option on the data to create two datasets with Visitor_Density of different levels. Again, excel can do this in about three mouseclicks so I am quite certain there is an easier way to do this but I don't know how..)
st.busy <- filter(AlltimespentBusy, AlltimespentBusy$Activity == "Stationary")
st.busy
st.quiet <- filter(AlltimespentQuiet, AlltimespentQuiet$Activity == "Stationary")
st.quiet
mean(st.busy$Duration)
# [1] 11.34448
mean(st.quiet$Duration)
# [1] 23.14706
Massive thanks from this R newbie!