# is the data in this coloumn skewed or normal?

``````data <- data.frame(
AGE = c(18, 18, 19, 19, 20, 20, 20, 21, 21, 21, 21, 21, 22, 22, 22, 22, 23, 24, 24, 24, 25, 25, 25, 25, 26, 26, 26, 26, 26, 27, 27, 27, 27, 27, 28, 28, 28, 29, 29, 29, 30, 30, 31, 31, 32, 32, 32, 32, 33, 33, 33, 33, 34)

View(data)

``````

The `AGE` vector is very slightly right-skewed in its `moments::skewness()` return value, but not very much. `shapiro.wilk()` test does not reject the null of normality.
On the other hand, it isn't a convincing distribution in part because ages do not take on negative values or extend to infinity like. And they are integers, not truly continuous. All I'd feel comfortable saying is that it is more nomral than it is skewed.

``````AGE = c(18, 18, 19, 19, 20, 20, 20, 21, 21, 21, 21, 21, 22, 22, 22, 22, 23, 24, 24, 24, 25, 25, 25, 25, 26, 26, 26, 26, 26, 27, 27, 27, 27, 27, 28, 28, 28, 29, 29, 29, 30, 30, 31, 31, 32, 32, 32, 32, 33, 33, 33, 33, 34)
# Create a histogram with density scale
hist(AGE, probability = TRUE, col = "lightblue", main = "Histogram with Density Plot", xlab = "Values")

# Overlay density plot
lines(density(AGE), col = "red", lwd = 2)
``````

``````qqnorm(AGE)
``````

``````age <- data.frame(AGE=AGE)
ggplot(age, aes(AGE)) +
geom_histogram(aes(y = after_stat(density)), binwidth = 0.5, fill = "lightblue", color = "black") +
geom_density(col = "red", lwd = 2) +
theme_minimal() +
labs(title = "Histogram with Density Plot", x = "Values", y = "Density")
#> Error in ggplot(age, aes(AGE)): could not find function "ggplot"

shapiro.test(AGE)
#>
#>  Shapiro-Wilk normality test
#>
#> data:  AGE
#> W = 0.95742, p-value = 0.05657

faux <- sample(18:34,1e6,replace = TRUE)
# Create a histogram with density scale
hist(faux, probability = TRUE, col = "lightblue", main = "Histogram with Density Plot", xlab = "Values")

# Overlay density plot
lines(density(faux), col = "red", lwd = 2)
``````

``````qqnorm(faux)
``````

``````
FAUX <- data.frame(FAUX=faux)
ggplot(FAUX, aes(FAUX)) +
geom_histogram(aes(y = after_stat(density)), binwidth = 0.5, fill = "lightblue", color = "black") +
geom_density(col = "red", lwd = 2) +
theme_minimal() +
labs(title = "Histogram with Density Plot", x = "Values", y = "Density")
#> Error in ggplot(FAUX, aes(FAUX)): could not find function "ggplot"

plot(ecdf(AGE), main = "Empirical Cumulative Distribution Function", xlab = "Data", ylab = "Cumulative Probability")
``````

``````
data_mean <- mean(AGE)
data_sd <- sd(AGE)
normal_ecdf <- function(x) {
pnorm(x, mean = data_mean, sd = data_sd)
}
normal_ecdf(AGE)
#>   0.04086717 0.04086717 0.06412373 0.06412373 0.09656279 0.09656279
#>   0.09656279 0.13968688 0.13968688 0.13968688 0.13968688 0.13968688
#>  0.19432557 0.19432557 0.19432557 0.19432557 0.26030511 0.33624104
#>  0.33624104 0.33624104 0.41953513 0.41953513 0.41953513 0.41953513
#>  0.50661344 0.50661344 0.50661344 0.50661344 0.50661344 0.59337650
#>  0.59337650 0.59337650 0.59337650 0.59337650 0.67576918 0.67576918
#>  0.67576918 0.75034043 0.75034043 0.75034043 0.81466592 0.81466592
#>  0.86754984 0.86754984 0.90898729 0.90898729 0.90898729 0.90898729
#>  0.93993233 0.93993233 0.93993233 0.93993233 0.96195741
plot(ecdf(AGE), main = "ECDF vs. Normal Distribution", xlab = "Data", ylab = "Cumulative Probability", col = "blue")
curve(normal_ecdf, add = TRUE, col = "red", lty = 2)
legend("bottomright", legend = c("ECDF", "Normal Distribution"), col = c("blue", "red"), lty = c(1, 2))
``````

Created on 2023-07-17 with reprex v2.0.2

okayy. so should treat as a normal distribution. I'm attempting to perform an anova

1 Like

The data is skewed to the right. This means that there are more data points on the right side of the distribution than on the left. The mean will be greater than the median. The data is skewed to the left. This means that there are more data points on the left side of the distribution than on the right. The mean will be less than the median..

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.