There is no minimum sample size for a t-test. You should opt for a wilcoxon test if your data is non-normal.
I'll apologise in advanced for the long post, but hopefully it'll help you to get more robust results, rather than relying on the standard testing repertoire that a lot of people use, which is often not fit for purpose (which I believe is the case here based on what you've said about your data).
Based on the type of data you have, I'd suggest creating a Poisson Model (which is usually used for count data) and use the group as a covariate in your Model. The output for the Poisson Model will give you a coefficient, standard error and p-value for being in group B compared to group A.
Here's an example (which also demonstrates that your data should be in long format). I'll generate group A to have an average of 20 and group B to have an average of 24 and use your sample sizes as above (I'll set the seed so you can replicate the random number generation):
set.seed(101)
df_A <- data.frame(grp = "A",conv = rpois(14,20))
df_B <- data.frame(grp = "B",conv = rpois(18,24))
df <- rbind(df_A,df_B)
mod <- glm(conv ~ grp, data=df, family="poisson")
So the df
has a column indicating the number of words in the conversation, conv
, and a column indicating which group the conversation came from, grp
. When we create the model we use the formula: conv ~ grp
, which means we want to regress conv
against grp
when creating our model.
Outputting this model doesn't give an awful lot of information:
> mod
Call: glm(formula = conv ~ grp, family = "poisson", data = df)
Coefficients:
(Intercept) grpB
2.992 0.202
Degrees of Freedom: 31 Total (i.e. Null); 30 Residual
Null Deviance: 35.31
Residual Deviance: 28.26 AIC: 190.1
But the value here: grpB = 0.202
indicates that the effect of being in group B increases the log-mean of the two groups by 0.202 (I'll get back to that in a minute).
We can get more information from the mod
by running it through the summary()
function:
> summary(mod)
Call:
glm(formula = conv ~ grp, family = "poisson", data = df)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.5336 -0.7031 0.1806 0.5101 1.8357
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.99215 0.05987 49.979 < 2e-16 ***
grpB 0.20197 0.07656 2.638 0.00834 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 35.310 on 31 degrees of freedom
Residual deviance: 28.257 on 30 degrees of freedom
AIC: 190.06
Number of Fisher Scoring iterations: 4
Here we see a table of coefficients. The first column matches what was originally output for the (Intercept)
and grpB
, the second gives the standard error (which can be used to calculate confidence intervals) and the last one is the p-value. The p-value for the intercept just means that the average in group A isn't 0 (duh), but the p-value for the grpB
is only 0.00834, so we can say that it is statistically significant. Therefore, we can reject H_0 and accept H_1.
What do I mean by "log-mean", well the way that Poisson Regression is formed is we try to find a function \theta, which acts on our covariates, z (in this case just the group) to give us an estimate of the average, \lambda of our Poisson distribution. So we're trying to solve:
\lambda = \exp\left(\theta(z)\right)
The results from the model are the coefficients that make up the \theta function, so in our example:
\theta(z) = 2.99215 + 0.20197*(\textrm{group}=B)
So if the group is A, \theta(z) = 2.99215, since the second term resolves to 0 and taking the exponential of that gives: \lambda = 19.92857. We can find this in R by pulling out the first coefficient from the model and taking it's exponent:
> exp(mod$coefficients[1])
(Intercept)
19.92857
OR by noting that this is the average of group A:
> mean(df_A$conv)
[1] 19.92857
If we move over to group B, we set \textrm{group}=B to be 1 and get \theta(z) = 2.9915 + 0.20197 = 3.19347 and therefore \lambda = 24.38889. Again, we can find this in R by summing both the coefficients:
> exp(sum(mod$coefficients))
[1] 24.38889
OR by noting that this is the average of group B:
> mean(df_B$conv)
[1] 24.38889
By using a Poisson Regression rather than a t-test or a wilcoxon test, we are making the assumption that the data is Poisson and based on the fact that it is count data, this is a fair assumption to make. By having tighter assumptions, this strengthens the results that there is a difference (or not if that be the case).
A t-test essentially does the same as the above, but it assumes that there is a straight-forward linear relationship between group A and group B, if your data is count-data, then this relationship may not hold. You can test this by running the glm()
function without specifying that we want a Poisson Regression(i.e. without the family = "poisson"
argument), which will run a regular linear regression and comparing the results with those of a t-test:
> mod_2 <- glm(conv~grp,data=df)
> t_test <- t.test(conv ~ grp,data=df,var.equal = T)
> summary(mod_2)$coefficients["grpB","Pr(>|t|)"]
[1] 0.01086051
> t_test$p.value
[1] 0.01086051