set.seed(1)
s_agal <- tibble(
id = 1:600,
year = sample(2000:2010, 600, replace = T),
antibiotic = sample(c("gentamicin","tetracycline","amoxicillin","penicillin"), 600, replace = T),
result = sample(c("sensitive", "resistant"), 600, replace = T)
)

I have been asked whether there has been a trend of increasing resistance of the tested antibiotics over the years and whether this is statistically significant.
How can I achieve the goal? Thank you Filippo

With random data, we can expect any trend to be the result of randomness, because randomness is full of patterns, one of which might be an upward or downward trend of pairs of numbers. Here's one approach.

library(tsibble)
#>
#> Attaching package: 'tsibble'
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, union
set.seed(1)
s_agal <- tibble::tibble(
id = 1:600,
year = sample(2000:2010, 600, replace = T),
antibiotic = sample(c("gentamicin","tetracycline","amoxicillin","penicillin"), 600, replace = T),
result = sample(c("sensitive", "resistant"), 600, replace = T)
)
r <- s_agal |>
dplyr::select(year,result) |>
dplyr::arrange(year) |>
dplyr::filter(result == "resistant") |>
dplyr::group_by(year) |>
dplyr::count()
fit <- lm(n ~ year, data = r)
summary(fit)
#>
#> Call:
#> lm(formula = n ~ year, data = r)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -8.8545 -2.9500 -1.3364 0.6455 18.0818
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1320.4091 1494.9222 0.883 0.400
#> year -0.6455 0.7456 -0.866 0.409
#>
#> Residual standard error: 7.82 on 9 degrees of freedom
#> Multiple R-squared: 0.07687, Adjusted R-squared: -0.0257
#> F-statistic: 0.7494 on 1 and 9 DF, p-value: 0.4091
s <- s_agal |>
dplyr::select(year,result) |>
dplyr::arrange(year) |>
dplyr::filter(result == "sensitive") |>
dplyr::group_by(year) |>
dplyr::count()
fit <- lm(n ~ year, data = s)
summary(fit)
#>
#> Call:
#> lm(formula = n ~ year, data = s)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -5.327 -2.718 -1.218 2.836 7.618
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 137.63636 830.49287 0.166 0.872
#> year -0.05455 0.41421 -0.132 0.898
#>
#> Residual standard error: 4.344 on 9 degrees of freedom
#> Multiple R-squared: 0.001923, Adjusted R-squared: -0.109
#> F-statistic: 0.01734 on 1 and 9 DF, p-value: 0.8981

I have a question about:
using the code you provided, for example for resistant ampicillin, I get this value.

# ampicillina - res
##
## Call:
## lm(formula = n ~ anno, data = r)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.8415 -2.9692 -0.2306 1.5141 9.1250
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1369.5577 450.8490 3.038 0.00952 **
## anno -0.6778 0.2242 -3.023 0.00980 **
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 4.445 on 13 degrees of freedom
## Multiple R-squared: 0.4128, Adjusted R-squared: 0.3676
## F-statistic: 9.138 on 1 and 13 DF, p-value: 0.009797

Since the number of analyzes carried out over the years is not constant and therefore 100 analyzes could have been carried out one year and 20 the following year, would it make sense to carry out the analysis on the percentage of resistance obtained annually instead of on the count?

In this case using this code, I get a similar but not the same result.

b <- agal %>%
select(year, antibiotic, result) %>%
arrange(year) %>%
filter(antibiotic == "ampicillina")
b1 <- b %>%
summarise(tot = n()) %>%
pull()
b2 <- b %>%
filter(result == "resistant") %>%
group_by(year) %>%
summarise(n = n(),
perc = round(n/b1*100, 2))
fit <- lm(perc ~ year, data = b2)
summary(fit)
## Call:
## lm(formula = perc ~ anno, data = b2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.95845 -0.48850 -0.03553 0.24938 1.49730
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 225.12118 74.09108 3.038 0.00951 **
## anno -0.11141 0.03685 -3.024 0.00978 **
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 0.7304 on 13 degrees of freedom
## Multiple R-squared: 0.4129, Adjusted R-squared: 0.3677
## F-statistic: 9.143 on 1 and 13 DF, p-value: 0.009783

The question is: is this approach incorrect?
Thank you, Filippo

The effect of this in s_agal is that every year has an equal opportunity to selected in the sample. Likewise with antibiotic and result. Is agal the same?

You have correctly noticed, these are laboratory investigations carried out on the basis of the samples received. In the event that milk from the tested farms is positive for S. agalactiae, then the susceptibility to some antibiotics is tested (you can see it from the real data).