I'm new to R and programming in general, and I am struggling to create a line graph with multiple lines, representing large groups of respondents. I work in development economics and I am using a large data set containing information on 18,000+ individuals in South Africa. I want to plot income on the x axis and years of education achieved on the y axis. Education is a discrete variable and only has values of 0-18 years. Income is continuous and spans from 652 (ZAR per month) to 20,000+. I then want to do this for 3 groups of people, representing 2,000 to 4,000 people each, and thus have 3 lines. I have tried several graphs and haven't gotten anywhere near what I need. Here is the code I have so far, but it is fairly useless as it creates a graph nothing like what I want (and yes, I know my data set name and var names are huge, but that is for a reason). Do I need to re-code the income data and put it in buckets? What am I doing wrong here? I put a link to a graph in a paper that I basically want to re-create, but with income instead of age.
e <- ggplot(data = Cross_w5_e_h_i_g_a_RStudio_25Sept23, aes(x = w5_hhinc_perm_CSGPool_CSM, y = w5_eduyrs_CSGPool_CSM_T1A)) +
geom_line() +
xlim(652, 20000) +
ylim(0,18)
```````r
Very similar to the graph I want - except this one has age instead of income, and has only 2 lines:
See page 27, graph on top right of page (I tried to cut and paste this in, and it didn't work)
https://opensaldru.uct.ac.za/bitstream/handle/11090/689/2018_125_Saldruwp.pdf?sequence=3
Would someone provide some advice on how to solve this?!
Btw, I tried to do reprex but I kept getting error messages. Thanks.