Problems with ggqqplot and ggdensity functions from ggpubr package

Hi all,

I am relatively new to R, and I am having problems understanding how it works.
In my work I am trying to do some statistical analyses on how the child population (aged 6-10yo) from a given country is distributed among different municipalities. I then want to understand if children with a given condition are distributed unequally according to multiple variables.

For this, among others, I installed ggpubr package, to use ggqqplot and ggdensity functions. However, none of them is really working for me.

I uploaded a txt file to RStudio, like this one:

V1 V2 V3 V4 V5 V6
1 1 1282 37 0 0 0
2 2 1727 443 2 0 0
3 3 156 49 0 0 0
4 4 990 87 7 7 4
5 5 494 0 0 0 0
6 6 2046 715 11 3 1
7 7 1673 325 2 0 0
8 8 133 0 0 0 0
9 9 184 13 0 0 0
10 10 1032 346 1 1 1
11 11 469 0 0 0 0
12 12 317 144 0 0 0
13 13 726 76 5 3 3
14 14 2902 411 5

Numbers in column v2 represent each municipality. v3-v6 are for different levels of my target population. I then try to run each of the functions:

For ggqqplot:
ggqqplot(x, x$v1, x$v2, combine = FALSE, merge = FALSE, color = "BLACK")

This appears:
Warning messages:
1: Computation failed in stat_qq_line():
factors are not allowed
2: Computation failed in stat_qq_line():
factors are not allowed

And I obtain a rubbish plot...

For ggdensity:
ggdensity(x, x$v1, x$v2)

And i obtain this:
Error in .check_data(data, x, y, combine = combine | merge != "none") :
x and y are missing. In this case data should be a numeric vector.

I obtain no plot.

I tried to do some stuff like this:

str(x)
'data.frame': 100 obs. of 6 variables:
V1: int 1 2 3 4 5 6 7 8 9 10 ... V2: int 1282 1727 156 990 494 2046 1673 133 184 1032 ...
V3: int 37 443 49 87 0 715 325 0 13 346 ... V4: int 0 2 0 7 0 11 2 0 0 1 ...
V5: int 0 0 0 7 0 3 0 0 0 1 ... V6: int 0 0 0 4 0 1 0 0 0 1 ...

And it doesn't say anything about "factors".
I suspect that probably, at least for ggqqplot, the problem is in how R is reading my data. Thus I probably have to change how I am preseting my data?
Any help would be great.

thanks

I see a couple things going on here, which may help you get started.

The first issue is that R is case sensitive. This means it differentiates between lowercase and uppercase letters so, e.g., "v1" is not the same as "V1". This is something that trips lots of folks up when starting with R.

Second, it looks like functions in the ggpubr package expect variable names to be given as strings. A string means that the variable name will be in quotes instead of bare names (without quotes) like we may be used to using in package ggplot2. So you'd give "V2" to ggqqplot() ( a string) instead of V2 (a bare name).

Given those two things, I think code like

ggqqplot(x, x = "V2", combine = TRUE, merge = FALSE, color = "BLACK")

is closer to what you want.

One thing that can be super helpful when using a function you've never used before (or, for me, even one I've used bunches of times :rofl:), is to take and run the code given in the "Examples" section at to the bottom of the help page as practice. I had to go to ?ggqqplot and scroll all the way to the bottom of the help page to see that the variable should be a string.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.