How to create a column with the sum ggplot

habana15 · March 4, 2020, 10:45am

Hello everyone,

I would like to know how to add a column with the sum of my data in a function which gives the mean and the sd (to create error bars).

Here is my function :

data_summary <- function(data = aerien1, varname = Nombre.de.contacts, groupnames = c(Odeur, Type.de.test)){
require(plyr)
summary_func <- function(x, col){
c(mean = mean(x[[col]], na.rm=TRUE),
sd = sd(x[[col]], na.rm=TRUE))
}
data_sum<-ddply(data, groupnames, .fun=summary_func,
varname)
data_sum <- rename(data_sum, c("mean" = varname))
return(data_sum)
}

da1 <- data_summary(aerien1, varname="Nombre.de.contacts",
groupnames=c("Odeur", "Type.de.test"))
head(da1)

Thanks for your help and sorry if my english is not correct.

mara · March 4, 2020, 12:29pm

Are you doing this just in order to have error bars in a ggplot? If so, ggplot already has that functionality for several geoms:

For a boxplot, you can specify what functions you want used to display the ranges:

habana15 · March 4, 2020, 12:35pm

Thanks for your answer.
Yes, I want to have errors bar in a ggplot (geom_bar) but, with this function, I have a new data with the sd (I want this) but also with the mean (I want this too) but I would like to have also the sum in another column. I was wondering if I can modify this function in order to have the column with the sum, to have at the end the barplot with the sum of contacts and the error bars

mara · March 4, 2020, 12:41pm

Do you mean as a label? I guess I'm unclear on what you mean by column.

habana15 · March 4, 2020, 12:45pm

Here is my code with the ggplot :

data_summary <- function(data = aerien1, varname = Nombre.de.contacts, groupnames = c(Odeur, Type.de.test)){
require(plyr)
summary_func <- function(x, col){
c(mean = mean(x[[col]], na.rm=TRUE),
sd = sd(x[[col]], na.rm=TRUE))
}
data_sum<-ddply(data, groupnames, .fun=summary_func,
varname)
data_sum <- rename(data_sum, c("mean" = varname))
return(data_sum)
}

da1 <- data_summary(aerien1, varname="Nombre.de.contacts",
groupnames=c("Odeur", "Type.de.test"))
head(da1)

n<- ggplot(da1, aes(x=Type.de.test, y=Nombre.de.contacts, fill=Odeur)) +
geom_bar(stat="identity", color="black",
position=position_dodge()) +
geom_errorbar(aes(ymin=Nombre.de.contacts-sd, ymax=Nombre.de.contacts+sd), width=.2,
position=position_dodge(.9)) +
geom_text(aes(label= Nombre.de.contacts), vjust=1.6, color="white", position = position_dodge(0.9), size=3.5)

n+labs(title="Nombre de contacts en fonction des odeurs dans les tests à 1 odeur en milieu aérien", x="Type de test", y = "Nombre de contacts", fill = "Odeur")+
theme_classic()

I don't know if it helps you, I'm not very good at R.

habana15 · March 4, 2020, 12:46pm

But, instead of having the mean of "Nombre.de.contacts", I would like to have the sum for each "type.de.test".

mara · March 4, 2020, 12:51pm

Could you please turn this into a self-contained reprex (short for reproducible example)? It will help us help you if we can be sure we're all working with/looking at the same stuff.

install.packages("reprex")

If you've never heard of a reprex before, you might want to start by reading the tidyverse.org help page. The reprex dos and don'ts are also useful.

There's also a nice FAQ on how to do a minimal reprex for beginners, below:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

What to do if you run into clipboard problems

If you run into problems with access to your clipboard, you can specify an outfile for the reprex, and then copy and paste the contents into the forum.

reprex::reprex(input = "fruits_stringdist.R", outfile = "fruits_stringdist.md")

For pointers specific to the community site, check out the reprex FAQ.

habana15 · March 4, 2020, 1:16pm

I'm really sorry but I don't know how to use reprex... But if this can help your, here is my head (da1) :

Odeur Type.de.test Nombre.de.contacts sd
1 affinité - affinité - vs eau 2.8 2.1679483
2 camphre camphre vs eau 1.6 0.8944272
3 carnivore carnivore vs eau 1.6 0.8944272
4 eau affinité - vs eau 2.0 1.7320508
5 eau camphre vs eau 1.6 0.8944272
6 eau carnivore vs eau 1.0 0.7071068

And I would like to have the sum of the "Nombre.de.contacts"

dromano · March 4, 2020, 3:34pm

Hi @habana15,

A first step on the way to creating a reprex, as @mara suggests, would be to paste the output from the dput() function between a pair of triple backticks (```), as below -- could you do that?

```
[paste output of dput(da1) here]
```

That would help us help you figure out what might be going wrong.

habana15 · March 4, 2020, 3:48pm

You want this ? :

structure(list(Odeur = structure(c(1L, 3L, 4L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L), .Label = c("affinité -", 
"affinité +", "camphre", "carnivore", "eau", "Familier", "hareng", 
"merlan", "Non Familier", "poivre", "soigneur 1", "soigneur 2"
), class = "factor"), Type.de.test = structure(c(2L, 3L, 4L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 5L, 6L, 7L, 8L, 9L, 
10L, 11L), .Label = c("affinité - vs affinité +", "affinité - vs eau", 
"camphre vs eau", "carnivore vs eau", "eau vs familier", "eau vs hareng", 
"eau vs merlan", "eau vs non familier", "eau vs poivre", "eau vs soigneur 1", 
"eau vs soigneur 2", "familier vs non familier", "hareng vs merlan", 
"soigneur 1 vs soigneur 2"), class = "factor"), Nombre.de.contacts = c(2.8, 
1.6, 1.6, 2, 1.6, 1, 1.4, 3.8, 2, 3, 1, 2.4, 1.8, 2.8, 8.2, 7, 
3, 1, 4, 3.4), sd = c(2.16794833886788, 0.894427190999916, 0.894427190999916, 
1.73205080756888, 0.894427190999916, 0.707106781186548, 0.547722557505166, 
1.92353840616713, 1.73205080756888, 0.707106781186548, 0, 1.14017542509914, 
1.30384048104053, 2.48997991959775, 3.96232255123179, 7.41619848709566, 
1.4142135623731, 0.707106781186548, 2.34520787991171, 3.20936130717624
)), class = "data.frame", row.names = c(NA, -20L))

dromano · March 4, 2020, 3:53pm

Exactly! This is very helpful. I'll return later, but maybe now someone else could help, too.

dromano · March 4, 2020, 7:06pm

OK, I made a reprex of your code with a smaller version of da1, and thought I'd write down the steps I took you so you can make your own in the future:

First, I created a new file with this in it:

library(tidyverse)
da1 <- 
  structure(list(Odeur = structure(c(1L, 3L, 4L, 5L, 5L), .Label = c("affinité -", 
                                                                     "affinité +", "camphre", "carnivore", "eau", "Familier", "hareng", 
                                                                     "merlan", "Non Familier", "poivre", "soigneur 1", "soigneur 2"
  ), class = "factor"), Type.de.test = structure(c(2L, 3L, 4L, 
                                                   2L, 3L), .Label = c("affinité - vs affinité +", "affinité - vs eau", 
                                                                       "camphre vs eau", "carnivore vs eau", "eau vs familier", "eau vs hareng", 
                                                                       "eau vs merlan", "eau vs non familier", "eau vs poivre", "eau vs soigneur 1", 
                                                                       "eau vs soigneur 2", "familier vs non familier", "hareng vs merlan", 
                                                                       "soigneur 1 vs soigneur 2"), class = "factor"), Nombre.de.contacts = c(2.8, 
                                                                                                                                              1.6, 1.6, 2, 1.6), sd = c(2.16794833886788, 0.894427190999916, 
                                                                                                                                                                        0.894427190999916, 1.73205080756888, 0.894427190999916)), row.names = c(NA, 
                                                                                                                                                                                                                                                -5L), class = "data.frame")
### end of dput() output

# inspect da1
da1

(I included the library() command because in order to be reproducible -- that is anyone can copy and paste it into their own machine and have it work -- the code should be self-contained, and the dplyr functions we'll need are contained within the tidyverse package.)

Second, I copied -- but did not yet paste -- all of this code, and immediately ran the command reprex() from the reprex package you'd need to install.

Last, before doing any more copying or pasting, I pasted here to get my reprex version:

library(tidyverse)
da1 <- 
  structure(list(Odeur = structure(c(1L, 3L, 4L, 5L, 5L), .Label = c("affinité -", 
                                                                     "affinité +", "camphre", "carnivore", "eau", "Familier", "hareng", 
                                                                     "merlan", "Non Familier", "poivre", "soigneur 1", "soigneur 2"
  ), class = "factor"), Type.de.test = structure(c(2L, 3L, 4L, 
                                                   2L, 3L), .Label = c("affinité - vs affinité +", "affinité - vs eau", 
                                                                       "camphre vs eau", "carnivore vs eau", "eau vs familier", "eau vs hareng", 
                                                                       "eau vs merlan", "eau vs non familier", "eau vs poivre", "eau vs soigneur 1", 
                                                                       "eau vs soigneur 2", "familier vs non familier", "hareng vs merlan", 
                                                                       "soigneur 1 vs soigneur 2"), class = "factor"), Nombre.de.contacts = c(2.8, 
                                                                                                                                              1.6, 1.6, 2, 1.6), sd = c(2.16794833886788, 0.894427190999916, 
                                                                                                                                                                        0.894427190999916, 1.73205080756888, 0.894427190999916)), row.names = c(NA, 
                                                                                                                                                                                                                                                -5L), class = "data.frame")
### end of dput() output

# inspect da1
da1
#>        Odeur      Type.de.test Nombre.de.contacts        sd
#> 1 affinité - affinité - vs eau                2.8 2.1679483
#> 2    camphre    camphre vs eau                1.6 0.8944272
#> 3  carnivore  carnivore vs eau                1.6 0.8944272
#> 4        eau affinité - vs eau                2.0 1.7320508
#> 5        eau    camphre vs eau                1.6 0.8944272

^{Created on 2020-03-04 by the reprex package (v0.3.0)}

Now, for you question, here's a suggested solution:

library(tidyverse)
da1 <- 
  structure(list(Odeur = structure(c(1L, 3L, 4L, 5L, 5L), .Label = c("affinité -", 
                                                                     "affinité +", "camphre", "carnivore", "eau", "Familier", "hareng", 
                                                                     "merlan", "Non Familier", "poivre", "soigneur 1", "soigneur 2"
  ), class = "factor"), Type.de.test = structure(c(2L, 3L, 4L, 
                                                   2L, 3L), .Label = c("affinité - vs affinité +", "affinité - vs eau", 
                                                                       "camphre vs eau", "carnivore vs eau", "eau vs familier", "eau vs hareng", 
                                                                       "eau vs merlan", "eau vs non familier", "eau vs poivre", "eau vs soigneur 1", 
                                                                       "eau vs soigneur 2", "familier vs non familier", "hareng vs merlan", 
                                                                       "soigneur 1 vs soigneur 2"), class = "factor"), Nombre.de.contacts = c(2.8, 
                                                                                                                                              1.6, 1.6, 2, 1.6), sd = c(2.16794833886788, 0.894427190999916, 
                                                                                                                                                                        0.894427190999916, 1.73205080756888, 0.894427190999916)), row.names = c(NA, 
                                                                                                                                                                                                                                                -5L), class = "data.frame")
### end of dput() output

# store sum of 'Nombre.de.contacts' in a new column I called 'total'
da1 %>% 
  summarise(total = sum(Nombre.de.contacts))
#>   total
#> 1   9.6

^{Created on 2020-03-04 by the reprex package (v0.3.0)}

habana15 · March 4, 2020, 7:54pm

Thanks a lot for these explications. But I think I didn't make myself clear. I want to add a new column inside da1 wich will show the sum of "Nombre.de.contact" for each odeur. The mean comes from my data "aerien1" and i would like to have at the end 5 columns in da1 : "Odeur" - "Type.de.test" - Nombre.de.contacts (mean) - sd - sum of "Nombre.de.contacts"

Here is the function I use for calculate the mean and sd :

function (data = aerien1, varname = Nombre.de.contacts, groupnames = c(Odeur, 
    Type.de.test)) 
{
    require(plyr)
    summary_func <- function(x, col) {
        c(mean = mean(x[[col]], na.rm = TRUE), sd = sd(x[[col]], 
            na.rm = TRUE))
    }
    data_sum <- ddply(data, groupnames, .fun = summary_func, 
        varname)
    data_sum <- rename(data_sum, c(mean = varname))
    return(data_sum)
}

dromano · March 4, 2020, 8:04pm

Oh, in that case:

da1 %>% 
  group_by(Odeur) %>% 
  mutate(totalOC = sum(Nombre.de.contacts))

Does that do what you're looking for?

dromano · March 4, 2020, 8:05pm

Although it's probably better to ungroup at then end, so you don't have unexpected results later:

da1 %>% 
  group_by(Odeur) %>% 
  mutate(totalOC = sum(Nombre.de.contacts)) %>% 
  ungroup()

habana15 · March 4, 2020, 8:13pm

I think there is a problem because the sum can be less than the mean... and it does not correspond to the true sum of the data

 Odeur        Type.de.test        Nombre.de.contacts    sd totalOC
   <fct>        <fct>                            <dbl> <dbl>   <dbl>
 1 affinité -   affinité - vs eau                  2.8 2.17      2.8
 2 camphre      camphre vs eau                     1.6 0.894     1.6
 3 carnivore    carnivore vs eau                   1.6 0.894     1.6
 4 eau          affinité - vs eau                  2   1.73     20  
 5 eau          camphre vs eau                     1.6 0.894    20  
 6 eau          carnivore vs eau                   1   0.707    20  
 7 eau          eau vs familier                    1.4 0.548    20  
 8 eau          eau vs hareng                      3.8 1.92     20  
 9 eau          eau vs merlan                      2   1.73     20  
10 eau          eau vs non familier                3   0.707    20  
11 eau          eau vs poivre                      1   0        20  
12 eau          eau vs soigneur 1                  2.4 1.14     20  
13 eau          eau vs soigneur 2                  1.8 1.30     20  
14 Familier     eau vs familier                    2.8 2.49      2.8
15 hareng       eau vs hareng                      8.2 3.96      8.2
16 merlan       eau vs merlan                      7   7.42      7  
17 Non Familier eau vs non familier                3   1.41      3  
18 poivre       eau vs poivre                      1   0.707     1  
19 soigneur 1   eau vs soigneur 1                  4   2.35      4  
20 soigneur 2   eau vs soigneur 2                  3.4 3.21      3.4

dromano · March 4, 2020, 8:52pm

You could try to post the original aerien table (or the first 50 rows, if it's big), then we could try to mimic you original calculations.

habana15 · March 4, 2020, 9:23pm

Odeur        Type.de.test Nombre.de.contacts
1           eau       eau vs hareng                  3
2        hareng       eau vs hareng                 10
3           eau       eau vs hareng                  1
4        hareng       eau vs hareng                 11
5           eau       eau vs hareng                  5
6        hareng       eau vs hareng                 12
7           eau       eau vs hareng                  6
8        hareng       eau vs hareng                  5
9        hareng       eau vs hareng                  3
10          eau       eau vs hareng                  4
11          eau eau vs non familier                  4
12          eau eau vs non familier                  3
13 Non Familier eau vs non familier                  5
14 Non Familier eau vs non familier                  2
15          eau eau vs non familier                  2
16          eau eau vs non familier                  3
17          eau eau vs non familier                  3
18 Non Familier eau vs non familier                  2
19 Non Familier eau vs non familier                  4
20 Non Familier eau vs non familier                  2
21          eau   eau vs soigneur 1                  3
22   soigneur 1   eau vs soigneur 1                  6
23          eau   eau vs soigneur 1                  1
24          eau   eau vs soigneur 1                  2
25          eau   eau vs soigneur 1                  2
26          eau   eau vs soigneur 1                  4
27   soigneur 1   eau vs soigneur 1                  1
28   soigneur 1   eau vs soigneur 1                  6
29   soigneur 1   eau vs soigneur 1                  5
30   soigneur 1   eau vs soigneur 1                  2
31          eau   eau vs soigneur 2                  4
32          eau   eau vs soigneur 2                  1
33          eau   eau vs soigneur 2                  1
34          eau   eau vs soigneur 2                  2
35          eau   eau vs soigneur 2                  1
36   soigneur 2   eau vs soigneur 2                  2
37   soigneur 2   eau vs soigneur 2                  2
38   soigneur 2   eau vs soigneur 2                  3
39   soigneur 2   eau vs soigneur 2                  9
40   soigneur 2   eau vs soigneur 2                  1

It's just a very short version of my data

dromano · March 4, 2020, 10:52pm

Thanks, and what you hoping to calculate from this table, ultimately?

habana15 · March 5, 2020, 7:45am

From this table, I would like to know if there is a difference of "Nombre de contacts" between two odors from the same test (for instance between "eau" and "hareng" for the "eau vs hareng" test.