Determine Statistical Difference Between Two Groups

Ah, thanks but you still have one Site name in there

site
Culyer and Dewe Outflow                 Ref1_Down                  Ref1_In 
                       1                       25                       14 
                Ref1_In                   Ref1_Up                 Ref1_Up  
                      11                       24                        1 
                Site1_BC               Site1_Down                 Site1_FC 
                       7                       25                       25 
               Site1_Out                 Site1_Up                 Site2_In 
                      25                       25                       11 
               Site2_In                 Site2_Out               Site2_Out  
                       1                        9                        2 
                Site3_In                Site3_Out               Site3_Out  
                      14                        8                        6 

Am I correct to assume that Culyer and Dewe Outflow corresponds to Site3_Out?

I'll recode it as that and see what happens.

I suddenly started looking at the entire table (See below) and realased we have a serious problem. After a ferocious struggle I managed to recode Culyer and Dewe Outflow , the problem being that it had a trailing space.

However if you look at the table below we are getting repeats

site
 Ref1_Down    Ref1_In   Ref1_In     Ref1_Up   Ref1_Up    Site1_BC Site1_Down 
        25         14         11         24          1          7         25 
  Site1_FC  Site1_Out   Site1_Up   Site2_In  Site2_In   Site2_Out Site2_Out  
        25         25         25         11          1          9          2 
  Site3_In  Site3_Out Site3_Out  
        14          9          6 

I have not had a chance to poke around but we may be getting the same thing in a number of instances.

From my point of view just correcting the spelling and getting rid of those trailing _ is all that is needed.

Edit: I spoke too soon. Trying a recode in the earlier dataset did not work so rather than wastin an hour or more trying to find the problem I just opened the file it a spreadsheet and manually edited it.

So I am now getting

sts
                          site      mean        sd   min       max
                        <char>     <num>     <num> <num>     <num>
 1:            Hinton_Upstream  638.6346 1509.7581 4.193  7650.000
 2:   Hinton_Inflow_Front_Cell  903.3801  965.3959 4.866  4250.000
 3:    Hinton_Inflow_Back_Cell  311.3610  321.5984 8.660   760.000
 4:             Hinton_Outflow  410.1724  424.3327 4.280  1456.000
 5:          Hinton_Downstream 1100.0959 2275.0393 4.440  9180.000
 6:         Blanchard_Upstream  772.8774 2174.2293 4.290 11020.000
 7:           Blanchard_Inflow  802.4028 1042.7360 4.240  4718.750
 8:       Blanchard_Downstream  730.7176 1914.8860 4.320  9723.214
 9:  Cuyler_and_Current_Inflow  743.1009 1209.8047 4.210  4449.040
10: Cuyler_and_Current_Outflow  493.7300  657.5103 4.230  2487.910
11:     Cuyler_and_Dewe_Inflow  829.7311  780.2407 4.380  2657.143
12:    Cuyler_and_Dewe_Outflow  341.0762  290.9289 4.210   872.549

And

DT[, table(site)]
site
      Blanchard_Downstream           Blanchard_Inflow         Blanchard_Upstream 
                        25                         25                         25 
 Cuyler_and_Current_Inflow Cuyler_and_Current_Outflow     Cuyler_and_Dewe_Inflow 
                        14                         14                         12 
   Cuyler_and_Dewe_Outflow          Hinton_Downstream    Hinton_Inflow_Back_Cell 
                        12                         25                          7 
  Hinton_Inflow_Front_Cell             Hinton_Outflow            Hinton_Upstream 
                        25                         25                         25

which I think is what you intended.

Current data.frame

structure(list(rainfall_number = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 
6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 
9L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 
11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 14L, 14L, 
14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 15L, 15L, 15L, 15L, 
15L, 15L, 15L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 17L, 17L, 17L, 
17L, 17L, 17L, 17L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 19L, 19L, 
19L, 19L, 19L, 19L, 19L, 19L, 19L, 20L, 20L, 20L, 20L, 20L, 20L, 
20L, 20L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 
21L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 23L, 
23L, 23L, 23L, 23L, 23L, 23L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 
24L, 24L, 24L, 24L, 24L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 
25L, 25L, 25L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 
26L, 26L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 
28L, 28L, 28L, 28L, 28L, 28L, 28L), date = structure(c(19222, 
19222, 19222, 19222, 19222, 19222, 19222, 19222, 19222, 19222, 
19227, 19227, 19244, 19244, 19244, 19244, 19244, 19244, 19244, 
19244, 19244, 19244, 19244, 19260, 19260, 19260, 19260, 19260, 
19260, 19260, 19260, 19260, 19260, 19277, 19277, 19277, 19277, 
19277, 19277, 19277, 19277, 19277, 19277, 19277, 19302, 19302, 
19302, 19302, 19302, 19302, 19302, 19476, 19476, 19476, 19476, 
19476, 19476, 19476, 19476, 19476, 19476, 19476, 19476, 19496, 
19496, 19496, 19496, 19496, 19496, 19496, 19496, 19496, 19532, 
19532, 19532, 19532, 19532, 19532, 19532, 19532, 19532, 19537, 
19537, 19537, 19537, 19537, 19537, 19537, 19547, 19547, 19547, 
19547, 19547, 19547, 19547, 19564, 19564, 19564, 19564, 19564, 
19564, 19564, 19564, 19564, 19564, 19564, 19577, 19577, 19577, 
19577, 19577, 19577, 19577, 19585, 19585, 19585, 19585, 19585, 
19585, 19585, 19636, 19636, 19636, 19636, 19636, 19636, 19636, 
19658, 19658, 19658, 19658, 19658, 19658, 19658, 19667, 19667, 
19667, 19667, 19667, 19667, 19667, 19667, 19667, 19821, 19821, 
19821, 19821, 19821, 19821, 19821, 19821, 19830, 19830, 19830, 
19830, 19830, 19830, 19830, 19830, 19830, 19830, 19830, 19830, 
19840, 19840, 19840, 19840, 19840, 19840, 19840, 19840, 19840, 
19840, 19840, 19861, 19861, 19861, 19861, 19861, 19861, 19861, 
19864, 19864, 19864, 19864, 19864, 19864, 19864, 19864, 19864, 
19864, 19864, 19864, 19877, 19877, 19877, 19877, 19877, 19877, 
19877, 19877, 19877, 19877, 19877, 19892, 19892, 19892, 19892, 
19892, 19892, 19892, 19892, 19892, 19892, 19892, 19892, 19902, 
19902, 19902, 19902, 19902, 19902, 19902, 19902, 19902, 19902, 
19902, 19942, 19942, 19942, 19942, 19942, 19942, 19942), class = "Date"), 
    site = c("Hinton_Upstream", "Hinton_Inflow_Front_Cell", "Hinton_Inflow_Back_Cell", 
    "Hinton_Outflow", "Hinton_Downstream", "Blanchard_Upstream", 
    "Blanchard_Inflow", "Blanchard_Downstream", "Cuyler_and_Current_Inflow", 
    "Cuyler_and_Current_Outflow", "Cuyler_and_Current_Inflow", 
    "Cuyler_and_Current_Outflow", "Hinton_Upstream", "Hinton_Inflow_Front_Cell", 
    "Hinton_Outflow", "Hinton_Downstream", "Blanchard_Upstream", 
    "Blanchard_Inflow", "Blanchard_Downstream", "Cuyler_and_Dewe_Inflow", 
    "Cuyler_and_Dewe_Outflow", "Cuyler_and_Current_Inflow", "Cuyler_and_Current_Outflow", 
    "Hinton_Upstream", "Hinton_Inflow_Front_Cell", "Hinton_Inflow_Back_Cell", 
    "Hinton_Outflow", "Hinton_Downstream", "Blanchard_Upstream", 
    "Blanchard_Inflow", "Blanchard_Downstream", "Cuyler_and_Current_Inflow", 
    "Cuyler_and_Current_Outflow", "Hinton_Upstream", "Hinton_Inflow_Front_Cell", 
    "Hinton_Outflow", "Hinton_Downstream", "Blanchard_Upstream", 
    "Blanchard_Inflow", "Blanchard_Downstream", "Cuyler_and_Dewe_Inflow", 
    "Cuyler_and_Dewe_Outflow", "Cuyler_and_Current_Inflow", "Cuyler_and_Current_Outflow", 
    "Hinton_Upstream", "Hinton_Inflow_Front_Cell", "Hinton_Outflow", 
    "Hinton_Downstream", "Blanchard_Upstream", "Blanchard_Inflow", 
    "Blanchard_Downstream", "Blanchard_Upstream", "Blanchard_Inflow", 
    "Blanchard_Downstream", "Hinton_Upstream", "Hinton_Inflow_Front_Cell", 
    "Hinton_Inflow_Back_Cell", "Hinton_Outflow", "Hinton_Downstream", 
    "Cuyler_and_Dewe_Inflow", "Cuyler_and_Dewe_Outflow", "Cuyler_and_Current_Inflow", 
    "Cuyler_and_Current_Outflow", "Blanchard_Upstream", "Blanchard_Inflow", 
    "Blanchard_Downstream", "Hinton_Upstream", "Hinton_Inflow_Front_Cell", 
    "Hinton_Outflow", "Hinton_Downstream", "Cuyler_and_Current_Inflow", 
    "Cuyler_and_Current_Outflow", "Blanchard_Upstream", "Blanchard_Inflow", 
    "Blanchard_Downstream", "Hinton_Upstream", "Hinton_Inflow_Front_Cell", 
    "Hinton_Outflow", "Hinton_Downstream", "Cuyler_and_Dewe_Inflow", 
    "Cuyler_and_Dewe_Outflow", "Blanchard_Upstream", "Blanchard_Inflow", 
    "Blanchard_Downstream", "Hinton_Upstream", "Hinton_Inflow_Front_Cell", 
    "Hinton_Outflow", "Hinton_Downstream", "Blanchard_Upstream", 
    "Blanchard_Inflow", "Blanchard_Downstream", "Hinton_Upstream", 
    "Hinton_Inflow_Front_Cell", "Hinton_Outflow", "Hinton_Downstream", 
    "Blanchard_Upstream", "Blanchard_Inflow", "Blanchard_Downstream", 
    "Hinton_Upstream", "Hinton_Inflow_Front_Cell", "Hinton_Outflow", 
    "Hinton_Downstream", "Cuyler_and_Dewe_Inflow", "Cuyler_and_Dewe_Outflow", 
    "Cuyler_and_Current_Inflow", "Cuyler_and_Current_Outflow", 
    "Blanchard_Upstream", "Blanchard_Inflow", "Blanchard_Downstream", 
    "Hinton_Upstream", "Hinton_Inflow_Front_Cell", "Hinton_Outflow", 
    "Hinton_Downstream", "Blanchard_Upstream", "Blanchard_Inflow", 
    "Blanchard_Downstream", "Hinton_Upstream", "Hinton_Inflow_Front_Cell", 
    "Hinton_Outflow", "Hinton_Downstream", "Blanchard_Upstream", 
    "Blanchard_Inflow", "Blanchard_Downstream", "Hinton_Upstream", 
    "Hinton_Inflow_Front_Cell", "Hinton_Outflow", "Hinton_Downstream", 
    "Blanchard_Upstream", "Blanchard_Inflow", "Blanchard_Downstream", 
    "Hinton_Upstream", "Hinton_Inflow_Front_Cell", "Hinton_Outflow", 
    "Hinton_Downstream", "Blanchard_Upstream", "Blanchard_Inflow", 
    "Blanchard_Downstream", "Hinton_Upstream", "Hinton_Inflow_Front_Cell", 
    "Hinton_Outflow", "Hinton_Downstream", "Cuyler_and_Dewe_Inflow", 
    "Cuyler_and_Dewe_Outflow", "Blanchard_Upstream", "Blanchard_Inflow", 
    "Blanchard_Downstream", "Hinton_Upstream", "Hinton_Inflow_Front_Cell", 
    "Hinton_Inflow_Back_Cell", "Hinton_Outflow", "Hinton_Downstream", 
    "Blanchard_Upstream", "Blanchard_Inflow", "Blanchard_Downstream", 
    "Hinton_Upstream", "Hinton_Inflow_Front_Cell", "Hinton_Inflow_Back_Cell", 
    "Hinton_Outflow", "Hinton_Downstream", "Cuyler_and_Dewe_Inflow", 
    "Cuyler_and_Dewe_Outflow", "Cuyler_and_Current_Inflow", "Cuyler_and_Current_Outflow", 
    "Blanchard_Upstream", "Blanchard_Inflow", "Blanchard_Downstream", 
    "Hinton_Upstream", "Hinton_Inflow_Front_Cell", "Hinton_Outflow", 
    "Hinton_Downstream", "Cuyler_and_Dewe_Inflow", "Cuyler_and_Dewe_Outflow", 
    "Cuyler_and_Current_Inflow", "Cuyler_and_Current_Outflow", 
    "Blanchard_Upstream", "Blanchard_Inflow", "Blanchard_Downstream", 
    "Hinton_Upstream", "Hinton_Inflow_Front_Cell", "Hinton_Outflow", 
    "Hinton_Downstream", "Blanchard_Upstream", "Blanchard_Inflow", 
    "Blanchard_Downstream", "Hinton_Upstream", "Hinton_Inflow_Front_Cell", 
    "Hinton_Inflow_Back_Cell", "Hinton_Outflow", "Hinton_Downstream", 
    "Cuyler_and_Dewe_Inflow", "Cuyler_and_Dewe_Outflow", "Cuyler_and_Current_Inflow", 
    "Cuyler_and_Current_Outflow", "Blanchard_Upstream", "Blanchard_Inflow", 
    "Blanchard_Downstream", "Hinton_Upstream", "Hinton_Inflow_Front_Cell", 
    "Hinton_Outflow", "Hinton_Downstream", "Cuyler_and_Dewe_Inflow", 
    "Cuyler_and_Dewe_Outflow", "Cuyler_and_Current_Inflow", "Cuyler_and_Current_Outflow", 
    "Blanchard_Upstream", "Blanchard_Inflow", "Blanchard_Downstream", 
    "Hinton_Upstream", "Hinton_Inflow_Front_Cell", "Hinton_Inflow_Back_Cell", 
    "Hinton_Outflow", "Hinton_Downstream", "Cuyler_and_Dewe_Inflow", 
    "Cuyler_and_Dewe_Outflow", "Cuyler_and_Current_Inflow", "Cuyler_and_Current_Outflow", 
    "Blanchard_Upstream", "Blanchard_Inflow", "Blanchard_Downstream", 
    "Hinton_Upstream", "Hinton_Inflow_Front_Cell", "Hinton_Outflow", 
    "Hinton_Downstream", "Cuyler_and_Dewe_Inflow", "Cuyler_and_Dewe_Outflow", 
    "Cuyler_and_Current_Inflow", "Cuyler_and_Current_Outflow", 
    "Blanchard_Upstream", "Blanchard_Inflow", "Blanchard_Downstream", 
    "Hinton_Upstream", "Hinton_Inflow_Front_Cell", "Hinton_Outflow", 
    "Hinton_Downstream"), tss = c(8.408, 8.794, 8.794, 8.898, 
    8.97, 9.67, 8.646, 4.32, 8.52, 8.96, 9.1, 8.85, 8.92, 9.38, 
    8.9, 8.27, 8.19, 8.59, 8.4, 8.64, 8.5, 8.5, 4.23, 8.59, 8.86, 
    8.66, 7.98, 8.2, 8.67, 8.55, 8.68, 4.21, 4.48, 4.193, 4.866, 
    4.28, 4.44, 4.29, 4.24, 4.57, 4.38, 4.21, 4.46, 4.54, 6.2, 
    5.124, 4.88, 6, 6, 7.2, 6.02, 123, 105, 95, 109, 250, 70.59, 
    91.95, 117.65, 313.87, 237.7, 98.16, 57.43, 66.797, 248.7, 
    47.964, 73.333, 484.63, 197.909, 851.33, 4449.04, 2487.91, 
    127.17, 431.37, 241.76, 177.78, 1107.53, 196.26, 172.73, 
    233.01, 162.16, 282.21, 482.35, 401.96, 120.97, 821.92, 367.92, 
    2365.52, 192.1, 299.1, 82, 41.9, 367.5, 317.3, 88, 120.8, 
    851.6, 159.1, 63.3, 1265.6, 433.3, 67.8, 788.6, 117.2, 606.6, 
    447.6, 248.2, 407.71, 455.6, 466.4, 45.2, 71.4, 98.7, 147.2, 
    778.4, 132.7, 156.9, 756.2, 94.12, 161.3, 54.9, 187.2, 64.2, 
    54.2, 153.2, 153.5, 49.3, 274.5, 600, 514.3, 700, 916.7, 
    291.7, 625.8, 185.7, 113.3, 114.3, 109.5, 245.8, 102, 161.7, 
    524.4, 73.3, 514.2857143, 700, 600, 612.9032258, 1500, 164.1791045, 
    670, 954.5454545, 920, 4718.75, 660, 638.0952381, 1290, 519.6850394, 
    644.6280992, 1066.666667, 2657.142857, 603.0534351, 712.8712871, 
    646.0176991, 523.8095238, 2480, 486.2385321, 590, 843.1372549, 
    490.9090909, 592.2330097, 630, 333.3333333, 512.8205128, 
    344.5378151, 560, 690, 540, 800, 900, 590, 7618.644068, 11020, 
    649.5726496, 9723.214286, 7650, 1490, 760, 890, 9180, 850, 
    600, 74, 752.3809524, 713.0434783, 1480, 694.4444444, 603.7735849, 
    4250, 1456, 631.0679612, 723.8095238, 390.9090909, 1311.926606, 
    745.2830189, 1861.788618, 1740.384615, 1790.47619, 1564.356436, 
    2073.394495, 647.6190476, 1170, 1200, 1537.735849, 872.5490196, 
    1893.203883, 670, 670, 1514.851485, 750, 740, 1620, 704.7619048, 
    640, 1685.185185, 690, 710, 730, 679.6116505, 1544.554455, 
    682.6923077, 657.1428571, 2166.666667, 1285.714286, 823.5294118
    )), row.names = c(NA, -234L), class = "data.frame")

@jrkrideau Thank you so much for this. I really appreciate it. This is what I intended. Does anyone have any recommendations for testing statistical differences (and calculating p-values) between sites?

Unless some one comes up with a better idea, I'd stay with your original idea unless your organization has a handy consulting statistician on staff.

BTW, in case you want to eschew "dynamite" plots, here is some code I put together in ggplot2 that does a simple mean & CI plot.

There is a ggplot extension {GitHub - const-ae/ggsignif: Easily add significance brackets to your ggplots} that I belive will do the p-value and lines for you

# load packages -----------------------------------------------------------
suppressMessages(library(data.table))
suppressMessages(library(tidyverse))
library(plotrix)
library(tinytable)
library(janitor)

# Create custom palate ----------------------------------------------------

cols <- c("blue", "blue", "red", "red", "green", "green")

# Load data ---------------------------------------------------------------
DT <- fread("rain_2.csv") |> clean_names()
DT[ , date := ymd(date)]
DT <- na.omit(DT)

# Clean up data -----------------------------------------------------------

DT[ , site := str_remove(site, "\\_+$")]

# Summaries ---------------------------------------------------------------

sts <- DT[, .(mean = mean(tss), sd = sd(tss), 
           min = min(tss), max = max(tss)), by = site]


# Recode "site"  ----------------------------------------------------------

DT[, site := fcase(
   site ==   "Hinton_Inflow_Front_Cell", "inflow1",
   site ==  "Cuyler_and_Dewe_Inflow",  "inflow2",
   site == "Cuyler_and_Current_Inflow", "inflow3",
   site == "Hinton_Outflow", "outflow1",
   site == "Cuyler_and_Dewe_Outflow", "outflow2",
   site == "Cuyler_and_Current_Outflow","outflow3"
)]

DT <- na.omit(DT)


PT1 <- DT[, .(mean_tss = mean(tss),
       se_tss  = std.error(tss)), by = site][, cl  := (mean_tss)-1.98 * se_tss][, cu  := mean_tss + 1.96 * se_tss]
PT1[, site := ordered( site, levels = c("inflow1", "outflow1", "inflow2", "outflow2", "inflow3", "outflow3"))]
       
ggplot(PT1, aes(site, mean_tss, colour = site))+ geom_point() +
  geom_errorbar(aes(ymin = cl, ymax = cu)) + theme(legend.position="none") +
scale_colour_manual(values = cols) + xlab("Sites") + ylab("Suspended Solids (mg/L)")

Continuing on my looking at your data I did a couple of plots of the distributions and they are rather interesting.

I don't imagine you will want them for the presentation but they may be useful sometime.

# load packages -----------------------------------------------------------
suppressMessages(library(data.table))
suppressMessages(library(tidyverse))
library(gtsummary)
library(tinytable)
library(janitor); 
library(here)


# Create custom palate ----------------------------------------------------

cols <- c("blue", "blue", "red", "red", "green", "green")

# Load data ---------------------------------------------------------------
DT <- fread("rain_2.csv") |> clean_names()
DT[ , date := ymd(date)]
DT <- na.omit(DT)

# Clean up data -----------------------------------------------------------

DT[ , site := str_remove(site, "\\_+$")]


# Recode "site"  ----------------------------------------------------------

DT[, site := fcase(
   site ==   "Hinton_Inflow_Front_Cell", "inflow1",
   site ==  "Cuyler_and_Dewe_Inflow",  "inflow2",
   site == "Cuyler_and_Current_Inflow", "inflow3",
   site == "Hinton_Outflow", "outflow1",
   site == "Cuyler_and_Dewe_Outflow", "outflow2",
   site == "Cuyler_and_Current_Outflow","outflow3"
)]

DT <- na.omit(DT)


# Outlier function --------------------------------------------------------
is_outlier <- function(x) {
  return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x))
}


# Clone BT as we may not want DT with ordered factors ---------------------

BT <- DT
BT[, site := ordered( site, levels = c("inflow1", "outflow1", "inflow2", "outflow2", "inflow3", "outflow3"))]
       
BT[, outlier := is_outlier(tss)]
BT[, dd := ifelse(outlier == TRUE, as.character(date), NA)]


# Boxplot -----------------------------------------------------------------

ggplot(BT, aes(site, tss, colour = site, )) + geom_boxplot() +
  theme(legend.position = "none") +
scale_colour_manual(values = cols) + xlab("Sites") + ylab("Suspended Solids (mg/L)") + 
 geom_text(aes(label = dd), na.rm = TRUE, hjust = -0.3)

# Beeswarm plot -----------------------------------------------------------
ggplot(BT, aes(site, tss, colour = site )) +  geom_beeswarm() + theme(legend.position="none") +
scale_colour_manual(values = cols) + xlab("Sites") + ylab("Suspended Solids (mg/L)")