# Help with boxplotting and summarising large dataset

I've got a massive data-set filled with different columns of data. I have been trying to analyse certain aspects of this dataset using R (done most of it using Minitab and Excel but want to learn this way of doing it as well) and I was hoping to get some help here.

Here is a snippet of my data:

`````` chimp
Who      Activity Visitors Duration Visitor_Density  X
1    Ch    Stationary        0       14            High NA
2    Ch    Stationary       20       18            High NA
3    Ch   Interaction        0        2            High NA
4    Ch    Stationary       30        6            High NA
5    Ch   Interaction       30        1            High NA
6    Ch       Display       30       10            High NA
7    Ch   Interaction        0        6            High NA
8    Ch    Stationary        0        5            High NA
9    Ch    Stationary       20       20            High NA
10   Ch    Stationary       30       13            High NA
``````

I am trying to create a boxplot showing the differences in the means of the Duration spent on each Activity. So far my code looks like this:

``````chimp\$Activity <- as.character(chimp\$Activity)
chimp\$Visitors <- as.numeric(chimp\$Visitors)
chimp\$Duration <- as.numeric(chimp\$Duration)
boxplot(chimp, x= chimp\$Activity, y = chimp\$Duration, color = chimp\$Visitor_Density)
``````

However I keep getting an error "Error in x[floor(d)] + x[ceiling(d)] : non-numeric argument to binary operator"

I am extremely new to R and have been attempting this for a while, I'm assuming the problem is with how I am using the code for boxplotting or am I missing a package that the specific code works for? Additionally if anyone can help me grab the mean of the total duration for each activity without doing it individually by hand would be much appreciated:
(I got this by using the filter() option on the data to create two datasets with Visitor_Density of different levels. Again, excel can do this in about three mouseclicks so I am quite certain there is an easier way to do this but I don't know how..)

``````st.busy <- filter(AlltimespentBusy, AlltimespentBusy\$Activity == "Stationary")
st.busy
st.quiet <- filter(AlltimespentQuiet, AlltimespentQuiet\$Activity == "Stationary")
st.quiet
mean(st.busy\$Duration)
# [1] 11.34448
mean(st.quiet\$Duration)
# [1] 23.14706
``````

Massive thanks from this R newbie!

Consider using `ggplot2`, see this example

``````library(ggplot2)

df <- data.frame(stringsAsFactors=FALSE,
Who = c("Ch", "Ch", "Ch", "Ch", "Ch", "Ch", "Ch", "Ch", "Ch",
"Ch"),
Activity = c("Stationary", "Stationary", "Interaction", "Stationary",
"Interaction", "Display", "Interaction", "Stationary",
"Stationary", "Stationary"),
Visitors = c(0, 20, 0, 30, 30, 30, 0, 0, 20, 30),
Duration = c(14, 18, 2, 6, 1, 10, 6, 5, 20, 13),
Visitor_Density = c("High", "High", "High", "High", "High", "High", "High",
"High", "High", "High")
)

ggplot(df, aes(x = Activity, y = Duration, color = Activity)) +
geom_boxplot()
``````

Created on 2019-10-23 by the reprex package (v0.3.0.9000)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.