[I edited the question]
Thank you so much,
I have a question about:
using the code you provided, for example for resistant ampicillin, I get this value.
# ampicillina - res
##
## Call:
## lm(formula = n ~ anno, data = r)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.8415 -2.9692 -0.2306 1.5141 9.1250
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1369.5577 450.8490 3.038 0.00952 **
## anno -0.6778 0.2242 -3.023 0.00980 **
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 4.445 on 13 degrees of freedom
## Multiple R-squared: 0.4128, Adjusted R-squared: 0.3676
## F-statistic: 9.138 on 1 and 13 DF, p-value: 0.009797
Since the number of analyzes carried out over the years is not constant and therefore 100 analyzes could have been carried out one year and 20 the following year, would it make sense to carry out the analysis on the percentage of resistance obtained annually instead of on the count?
In this case using this code, I get a similar but not the same result.
b <- agal %>%
select(year, antibiotic, result) %>%
arrange(year) %>%
filter(antibiotic == "ampicillina")
b1 <- b %>%
summarise(tot = n()) %>%
pull()
b2 <- b %>%
filter(result == "resistant") %>%
group_by(year) %>%
summarise(n = n(),
perc = round(n/b1*100, 2))
fit <- lm(perc ~ year, data = b2)
summary(fit)
## Call:
## lm(formula = perc ~ anno, data = b2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.95845 -0.48850 -0.03553 0.24938 1.49730
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 225.12118 74.09108 3.038 0.00951 **
## anno -0.11141 0.03685 -3.024 0.00978 **
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 0.7304 on 13 degrees of freedom
## Multiple R-squared: 0.4129, Adjusted R-squared: 0.3677
## F-statistic: 9.143 on 1 and 13 DF, p-value: 0.009783
The question is: is this approach incorrect?
Thank you, Filippo