 Create graph and compare

Greetings!
I am looking to create two graphs for women and another for men. Within each graph, I am interested to compare Whites and African Americans

I searched for solutions already available but couldn't get one that fits the data.

My data contains:

gender<-dat\$gender ( #Gender column has values female and male)
race<-dat\$race (#race column has values white and non-white)
callback<-dat\$received_callback (#callback column has values 0 and 1)

I tried following code to get graphs

par(mfrow = c(1, 2))
plot(x=gender, y=callback, race, xlim=2, ylab="callback", xlab= "gender")

I get the following error
Error in plot.window(...) : invalid 'xlim' value
In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion

Thank you
Junaid

xlim should be a lower and upper pair, as in

xlim = c(0,2)

Thank you @startz

I get this error now:
Error in plot.xy(xy, type, ...) : invalid plot type
In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion

I tried using na.omit() to remove NAs but the error persists

Regards
Junaid

You need a comma between x and y.

plot(x=gender, y=callback, race, xlim=c(0,2), ylab="callback", xlab= "gender")

The above code, I am using to plot the graph and the error is as given in the previous reply.

I am trying to achieve something like this Please post your data. You can use the output of the dput() function. For example,

dput(gender)

will give output that can be posted here and other people can use to recreate your data.

If your data set is very small, you can simply post R code that defines each object, like

callback <- c(0,23, 0.12, 0.34, 0.22)

for a vector with four values.

One problem with

plot(x=gender, y=callback, race, xlim=c(0,2), ylab="callback", xlab= "gender")

is that race is not used correctly. It seems you want to have separate points for different races but that is not the correct way to accomplish that.

it is a csv file, it is not getting uploaded here.

However, the data is available here Data Sets

Scroll down to the end of the webpage and you will find the CSV file to download

I am interested in using the following columns of the data set
Gender, Race, Recieved_callback for the graph

Thank you for the help

The csv file you linked, as it stands, cannot be used to easily produce a plot like you posted above. Here is a quick examination of the data set.

#Pick the 3 columns of interest so that the following summary() is easy to read
DF_reduced <- DF[, c("received_callback", "race", "gender")]
summary(DF_reduced)
Min.   :0.00000   Length:4870        Length:4870
1st Qu.:0.00000   Class :character   Class :character
Median :0.00000   Mode  :character   Mode  :character
Mean   :0.08049
3rd Qu.:0.00000
Max.   :1.00000

0    1
4478  392

table(DF_reduced\$race)
black white
2435  2435

table(DF_reduced\$gender)
f    m
3746 1124

The values of received_callback are 0 or 1, meaning, I suppose, No and Yes. The race column only has two values, white and black. If you plot received_callback versus race for one gender, you will see only four points: (black, 0), (black, 1), (white, 0), (white,1). Rather than simply show how I would handle the data, let me ask some questions:

1. Why are you doing this? Is it homework for a class? Is it self study?
2. How would you describe the number you want to see on the y axis? How would you calculate it?

Not exactly an assignment but yes an extra-credit assignment.

Y-axis can have only two values (1/0 or yes/No) or it can be kept as fractional values from 0 to 1

Since it is for a class, I'll avoid directly giving you the answer. How would you calculate the fraction of Yes (1) received_callback responses for each combination of gender and race? That is, how would you make a table like

gender  race   Frac
f       black  0.xxx
m       black  0.yyy
f       white  0.zzz
m       white  0.www

I could make a graph, I used the plot() function but it seems I had to add something that would make the plot realistic and more representative of the data.
Now you can share the code as the due date is over .

Sorry, this slipped my mind yesterday.
Here are two plotting methods to make individual plots for females and males. The first uses the base plotting package and the second uses ggplot. I spent no time polishing the appearance of the plots.

library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#>     filter, lag
#> The following objects are masked from 'package:base':
#>
#>     intersect, setdiff, setequal, union
Summary <- DF |> group_by(gender, race) |>
#> `summarise()` has grouped output by 'gender'. You can override using the `.groups` argument.
Summary
#> # A tibble: 4 x 3
#> # Groups:   gender 
#>   gender race    Frac
#>   <chr>  <chr>  <dbl>
#> 1 f      black 0.0663
#> 2 f      white 0.0989
#> 3 m      black 0.0583
#> 4 m      white 0.0887
Summary\$race <- factor(Summary\$race)

#using the base plotting method
par(mfrow = c(1, 2))
tmp <- subset(Summary, gender == "f")
plot.default(x=tmp\$race, y=tmp\$Frac, ylab="callback",xaxt = "n",
xlab= "race", type = "b", main = "Female")
axis(side = 1, at = c(1,2), labels = tmp\$race)

tmp <- subset(Summary, gender == "m")
plot.default(x=tmp\$race, y=tmp\$Frac, ylab="callback",xaxt = "n",
xlab= "race", type = "b", main = "Male")
axis(side = 1, at = c(1,2), labels = tmp\$race)

#Using ggplot
library(ggplot2) ggplot(data = Summary, aes(race, Frac, group = 1)) +
geom_point() + geom_line() +
facet_wrap(~gender,
labeller = labeller(gender = c("f" = "Female", "m" = "Male"))) Created on 2021-12-07 by the reprex package (v2.0.1)

1 Like

Thank you @FJCC
Much appreciated.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.