# statistical analysis of antibiotic resistance

Hi everybody,
I have a data frame like this

``````set.seed(1)
s_agal <- tibble(
id = 1:600,
year = sample(2000:2010, 600, replace = T),
antibiotic = sample(c("gentamicin","tetracycline","amoxicillin","penicillin"), 600, replace = T),
result = sample(c("sensitive", "resistant"), 600, replace = T)
)
``````

I have been asked whether there has been a trend of increasing resistance of the tested antibiotics over the years and whether this is statistically significant.
How can I achieve the goal? Thank you Filippo

With random data, we can expect any trend to be the result of randomness, because randomness is full of patterns, one of which might be an upward or downward trend of pairs of numbers. Here's one approach.

``````library(tsibble)
#>
#> Attaching package: 'tsibble'
#> The following objects are masked from 'package:base':
#>
#>     intersect, setdiff, union
set.seed(1)
s_agal <- tibble::tibble(
id = 1:600,
year = sample(2000:2010, 600, replace = T),
antibiotic = sample(c("gentamicin","tetracycline","amoxicillin","penicillin"), 600, replace = T),
result = sample(c("sensitive", "resistant"), 600, replace = T)
)
r <- s_agal                             |>
dplyr::select(year,result)            |>
dplyr::arrange(year)                  |>
dplyr::filter(result == "resistant")  |>
dplyr::group_by(year)                 |>
dplyr::count()

fit <- lm(n ~ year, data = r)
summary(fit)
#>
#> Call:
#> lm(formula = n ~ year, data = r)
#>
#> Residuals:
#>     Min      1Q  Median      3Q     Max
#> -8.8545 -2.9500 -1.3364  0.6455 18.0818
#>
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1320.4091  1494.9222   0.883    0.400
#> year          -0.6455     0.7456  -0.866    0.409
#>
#> Residual standard error: 7.82 on 9 degrees of freedom
#> Multiple R-squared:  0.07687,    Adjusted R-squared:  -0.0257
#> F-statistic: 0.7494 on 1 and 9 DF,  p-value: 0.4091

s <- s_agal                             |>
dplyr::select(year,result)            |>
dplyr::arrange(year)                  |>
dplyr::filter(result == "sensitive")  |>
dplyr::group_by(year)                 |>
dplyr::count()

fit <- lm(n ~ year, data = s)
summary(fit)
#>
#> Call:
#> lm(formula = n ~ year, data = s)
#>
#> Residuals:
#>    Min     1Q Median     3Q    Max
#> -5.327 -2.718 -1.218  2.836  7.618
#>
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 137.63636  830.49287   0.166    0.872
#> year         -0.05455    0.41421  -0.132    0.898
#>
#> Residual standard error: 4.344 on 9 degrees of freedom
#> Multiple R-squared:  0.001923,   Adjusted R-squared:  -0.109
#> F-statistic: 0.01734 on 1 and 9 DF,  p-value: 0.8981
``````

Created on 2023-02-17 with reprex v2.0.2

1 Like

[I edited the question]
Thank you so much,

using the code you provided, for example for resistant ampicillin, I get this value.

``````# ampicillina - res
##
## Call:
## lm(formula = n ~ anno, data = r)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -5.8415 -2.9692 -0.2306  1.5141  9.1250
##
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1369.5577   450.8490   3.038  0.00952 **
## anno          -0.6778     0.2242  -3.023  0.00980 **
## ---
## Signif. codes:  0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 4.445 on 13 degrees of freedom
## Multiple R-squared:  0.4128, Adjusted R-squared:  0.3676
## F-statistic: 9.138 on 1 and 13 DF,  p-value: 0.009797
``````

Since the number of analyzes carried out over the years is not constant and therefore 100 analyzes could have been carried out one year and 20 the following year, would it make sense to carry out the analysis on the percentage of resistance obtained annually instead of on the count?

In this case using this code, I get a similar but not the same result.

``````
b <- agal %>%
select(year, antibiotic, result) %>%
arrange(year) %>%
filter(antibiotic == "ampicillina")

b1 <- b %>%
summarise(tot = n()) %>%
pull()

b2 <- b %>%
filter(result == "resistant") %>%
group_by(year) %>%
summarise(n = n(),
perc = round(n/b1*100, 2))

fit <- lm(perc ~ year, data = b2)
summary(fit)

## Call:
## lm(formula = perc ~ anno, data = b2)
##
## Residuals:
##      Min       1Q   Median       3Q      Max
## -0.95845 -0.48850 -0.03553  0.24938  1.49730
##
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) 225.12118   74.09108   3.038  0.00951 **
## anno         -0.11141    0.03685  -3.024  0.00978 **
## ---
## Signif. codes:  0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 0.7304 on 13 degrees of freedom
## Multiple R-squared:  0.4129, Adjusted R-squared:  0.3677
## F-statistic: 9.143 on 1 and 13 DF,  p-value: 0.009783
``````

The question is: is this approach incorrect?
Thank you, Filippo

The effect of this in `s_agal` is that every `year` has an equal opportunity to selected in the sample. Likewise with `antibiotic` and `result`. Is `agal` the same?

You have correctly noticed, these are laboratory investigations carried out on the basis of the samples received. In the event that milk from the tested farms is positive for S. agalactiae, then the susceptibility to some antibiotics is tested (you can see it from the real data).

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.