exclude na - please help from someone just starting

**hi, I'm having trouble with my code, all of which return the following message:

agregated_brfss2013 <- brfss2013%>%

  • group_by(genhlth, sex)%>%
  • ggplot(agregated_brfss2013, mapping = aes(x= genhlth)) + geom_bar(aes(fill = "sex"), position = "dodge")
    Warning messages:
    1: Factor genhlth contains implicit NA, consider using forcats::fct_explicit_na
    2: Factor sex contains implicit NA, consider using forcats::fct_explicit_na

All the code I write returns to me this same message...(variable) contains implicit NA, consider using forcats::fct_explicit_na , which doesn't make sense. I just started with this language, can someone please help?

There are some evident issues with your code:

Grouping variables without making any summary or modification doesn't have any effect

By doing this, you are passing the same dataframe twice to the ggplot() function wich is not correct.

It is not clear what you are trying to accomplish with this code, to help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

1 Like

Adres, thanks again. Is the following a reproducible example?

sample(brfss2013)
X_rfdrwm4 rremtsm2 X_rfchol X_psu X_rfhlth feetchk stopsmk2
1 No No Yes 2013000580 Fair or Poor Health NA
2 No No No 2013000593 Good or Better Health NA
3 Yes No No 2013000600 Good or Better Health NA Yes
X_minac11 diabage2 cvdasprn renthom1 casthno2 smoke100 prediab1
1 NA No Own Yes Yes
2 100 No Own No No
3 NA No Own Yes No
aspunsaf pdiabtst longwtch htin4 cellfon3
1 Yes, stomach problems Yes 408 67 Not a cellular phone
2 No Yes NA 70 Not a cellular phone
3 No Yes NA 64 Not a cellular phone
X_veglt1 bphigh4 mishopls
1 Consumed vegetables less than one time per day Yes Some
2 Consumed vegetables one or more times per day No None
3 Consumed vegetables one or more times per day No None
rrclass2 eyeexam X_rfseat3
1 Black or African American Always wear seat belt
2 White Always wear seat belt
3 White Always wear seat belt
X_parec1 pavig11_ X_fruitex
1 Did not meet either guideline NA No missing values and in accepted range
2 Did not meet either guideline 0 No missing values and in accepted range
3 Did not meet either guideline NA No missing values and in accepted range
pvtresd1 X_cllcpwt X_paindx1 pcpsare1 pctcell padur2_
1 Yes NA Did not meet aerobic recommendations NA NA
2 Yes 954.0782 Did not meet aerobic recommendations NA 10
3 Yes NA Did not meet aerobic recommendations NA NA
cpdemo4 bldsugar X_rfsmok3 genhlth cvdcrhd4 misphlpf X_asthms1 ssbfrut2
1 10 NA No Fair Disagree slightly Current 305
2 70 NA No Good No Disagree slightly Never 0
3 70 NA Yes Good No Agree strongly Never 308
padur1_ X_impmrtl misnervs pafreq2_ medscost addepev2 scntlpad chccopd1
1 NA NA Some NA No Yes Paid by salary Yes
2 20 NA A little 1000 No Yes No
3 NA NA None NA No Yes No
nocov121 asymptom X_impcrac X_dualuse
1 No Every day, but not all the time No Dual Phone Use
2 No White, non-Hispanic No Dual Phone Use
3 No No Dual Phone Use
X_rfdrhv4 actin11_ X_rfseat2 mistmnt vegetab1
1 No Always or almost always wear seat belt Yes NA
2 No Moderate Always or almost always wear seat belt No 203
3 Yes Always or almost always wear seat belt No 330
delaymed exeroft2 insulin
1 You didn't have transportation NA
2 No, I did not delay getting medical care 101
3 NA
tetanus X_misvegn
1 No, did not receive any tetanus since 2005 1 missing response
2 Yes, received Tdap No missing vegetable responses
3 Yes, received Tdap No missing vegetable responses
qstver X_ltasth1
1 Only Version Landline Yes
2 Only Version Landline No
3 Only Version Landline No
educa dlyother
1 College 4 years or more (College graduate)
2 College 1 year to 3 years (Some college or technical school)
3 College 4 years or more (College graduate)
lstblds3 ctelenum bloodcho cvdinfr4 imonth rratwrk2
1 Yes Yes No January
2 Yes Yes No January The same as other races
3 5 or more years ago Yes Yes No January The same as other races
asrchkup avedrnk2 X_drnkmo4 scntlwk1 iyear iday sex rrhcare3
1 3 2 2 NA 2013 9 Female The same as other races
2 NA NA 0 NA 2013 19 Female The same as other races
3 NA 4 80 NA 2013 19 Female The same as other races
qstlang lmtjoin3 medicare height3 chckidny fvbeans X_race
1 English Yes Yes 507 Yes 303 Black only, non-Hispanic
2 English No 510 No 310 White only, non-Hispanic
3 English Yes No 504 No 202 White only, non-Hispanic
pa1min_ bpmeds X_state flushot6 pvtresd2 qlmentl2 misnowrk X_pa30021 numwomen
1 NA Yes Alabama No 30 30 0-300 minutes 1
2 110 Alabama Yes 2 0 0-300 minutes 1
3 NA Alabama Yes 2 0 0-300 minutes 1
scntvot1 asthnow metvl11_ X_aidtst3 psatime X_incomg rcsrltn2 cvdstrk3
1 Yes Yes NA No $50,000 or more No
2 Yes 35 Yes $50,000 or more Parent No
3 Yes NA Yes $50,000 or more No
profexam X_casthm1 menthlth fvgreen maxvo2_ X_bmi5 children diffalon
1 Yes Yes 29 310 2580 3916 0 Yes
2 Yes No 0 203 2950 1822 2 No
3 Yes No 2 202 2765 2746 0 No
X_educag X_strwt asthma3 vegeda1_
1 Graduated from college or technical school 40.19767 Yes NA
2 Attended college or technical school 40.19767 No 43
3 Graduated from college or technical school 40.19767 No 100
X_mrace1 X_raceg21 X_impnph
1 Black or African American Non-White or Hispanic 2
2 White Non-Hispanic White 1
3 White Non-Hispanic White 1
X_smoker3 exerhmm2 diffdres X_llcpwt2 X_rfhype5
1 Former smoker NA No 331.4934 Yes
2 Never smoked 10 No 662.9867 No
3 Current smoker - now smokes some days NA No 994.4801 No
X_misfrtn
1 No missing fruit responses
2 No missing fruit responses
3 No missing fruit responses
exract21 htm4 X_impcsex hadmam
1 170 Yes
2 Household Activities (vacuuming, dusting, home repair, etc.) 178 Female Yes
3 163 Yes
X_race_g1 X_llcpwt metvl21_ employ1 X_age80 sleptim1
1 Black - Non-Hispanic 238.0161 NA Retired 60 NA
2 White - Non-Hispanic 737.6942 33 Employed for wages 50 6
3 White - Non-Hispanic 568.5274 NA Employed for wages 55 9
chcscncr poorhlth marital casthdx2
1 No 30 Divorced
2 No NA Married No
3 No 0 Married
X_totinda internet fruitju1 doctdiab
1 No physical activity or exercise in last 30 days Yes 304
2 Had physical activity or exercise Yes 305
3 No physical activity or exercise in last 30 days Yes 301
X_drnkdy4 cstate cpdemo1 diabete3 mscode useequip
1 7 Yes No Inside a suburban county of the MSA Yes
2 0 Yes No Inside a suburban county of the MSA No
3 267 Yes No Inside a suburban county of the MSA No
X_lmtscl1 flshtmy2 X_crace1
1 Told have arthritis and social activities limited a lot
2 Not told they have arthritis October 2012 White
3 Told have arthritis and social activities limited a little January 2013
usenow3 cellfon2 scntpaid X_vegresp
1 Not at all Not Included - Missing Fruit Responses
2 Not at all Paid by the hour Included - Missing Fruit Responses
3 Not at all Paid by the hour Included - Missing Fruit Responses
qlactlm2 hadpap2 chcocncr asattack X_pa300r2 carercvd cclghous
1 Yes Yes No Yes 0 minutes Very satisfied
2 No Yes No 1-300 minutes Very satisfied
3 Yes Yes No 0 minutes Very satisfied
lastsmk2 hpvadsht ctelnum1 whrtst10 X_impeduc
1 10 years or more NA
2 Private doctor or HMO NA
3 At home NA
lastsig3 X_impcage medbills actin21_ pcpsars1 ladult
1 Within past 2 years No
2 10-14 Years old No Moderate
3 Within past 3 years No
X_pa150r2 drvisits exract11 frutda1_ X_frtlt1
1 0 minutes 5 400 Consumed fruit one or more times per day
2 1-149 minutes 3 Walking 3 Consumed fruit less than one time per day
3 0 minutes 6 43 Consumed fruit less than one time per day
idate seatbelt alcday5 pa1vigm_ hadsigm3 lastpap2 X_age65yr
1 1092013 Always 201 Yes 5 or more years ago Age 18 to 64
2 1192013 Always 0 0 No Within past 3 years Age 18 to 64
3 1192013 Always 220 Yes Within past year Age 18 to 64
X_cholchk aservist numhhol2 lstcovrg
1 Had cholesterol checked in past 5 years 0 Yes
2 Had cholesterol checked in past 5 years NA No
3 Did not have cholesterol checked in past 5 years NA No
grenday_ asinhalr X_frt16 rrcognt2 exerhmm1
1 33 5 to 14 times Included - values are in accepted range Never NA
2 43 Included - values are in accepted range Never 20
3 29 Included - values are in accepted range Never NA
dispcode emtsuprt mistrhlp miseffrt
1 Completed interview Always Agree strongly All
2 Completed interview Sometimes Agree slightly None
3 Completed interview Usually Agree strongly None
X_hispanc ftjuda1_ X_imprace X_rfbmi5
1 Not of Hispanic, Latino/a, or Spanish origin 13 Black, Non-Hispanic Yes
2 Not of Hispanic, Latino/a, or Spanish origin 17 White, Non-Hispanic No
3 Not of Hispanic, Latino/a, or Spanish origin 3 White, Non-Hispanic Yes
fvorang X_racegr3 diffwalk physhlth bldstool
1 303 Black only, Non-Hispanic Yes 30 No
2 202 White only, Non-Hispanic No 0 No
3 310 White only, Non-Hispanic Yes 3 Yes
X_vegetex pcdmdecn fruit1 miswtles drocdy3_
1 Missing vegetables responses NA 104 None 3
2 No missing values and in accepted range NA 301 None 0
3 No missing values and in accepted range NA 203 None 67
X_imphome X_lmtwrk1 harehab1 scntmony diabeye
1 NA Told have arthritis and have limited work Never
2 NA Not told they have arthritis Never
3 NA Told have arthritis and have limited work Never
rducstrk hpvadvc2 asnoslep blind smokday2 wtkg3 asthmed3 misdeprd qlhlth2
1 None No Not at all 11340 1 to 14 days None 0
2 No 5761 None 25
3 No Some days 7257 None 2
beanday_ rlivpain medcost cadult decide pneuvac3 dradvise joinpain pcpsade1
1 10 No No Yes Yes 7
2 33 No No No No NA
3 29 No No No No 5
X_prace1 wtchsalt feetchk2 X_age_g scntmeal toldhi2
1 Black or African American Yes NA Age 55 to 64 Never Yes
2 White No NA Age 45 to 54 Never No
3 White No NA Age 55 to 64 Never No
X_pneumo2 maxdrnks hivtstd3 diabedu X_pacat1 seqno orngday_
1 2 NA Inactive 2013000580 10
2 NA NA Insufficiently active 2013000593 29
3 10 NA Inactive 2013000600 33
X_frutsum weight2 X_flshot6 drnk3ge5 hadhyst2 veteran3 lengexam
1 413 250 0 Yes No Within past 2 years
2 20 127 NA No No Within past 3 years
3 46 160 20 Yes No Within past year
strfreq_ landline rcsgendr rduchart strength pamin11_
1 0 0 NA
2 0 Girl 0 100
3 0 0 NA
X_chispnc X_rfdrmn4 fc60_ scntwrk1
1 442 NA
2 Child not of Hispanic, Latino/a, or Spanish origin 506 35
3 474 40
asthmage pafreq1_ asactlim X_bmi5cat pamin21_ lsatisfy X_dualcor drnkany5
1 56 NA NA Obese NA Very satisfied NA No
2 NA 5000 NA Underweight 10 Satisfied NA Yes
3 NA NA NA Overweight NA Very satisfied NA No
X_frtresp arthedu persdoc2 numadult X_minac21
1 Included - Missing Fruit Responses No Yes, only one 2 NA
2 Included - Missing Fruit Responses Yes, only one 2 10
3 Included - Missing Fruit Responses No Yes, only one 3 NA
fmonth arthsocl havarth3 checkup1 colghous pamiss1_ howlong
1 January A lot Yes Within past year 0 Within past 2 years
2 January No Within past year 0 Within past 3 years
3 January A little Yes Within past year 0 Within past year
X_wt2rake arthexer cholchk arttoday
1 40.19767 Yes Within past year I can do some things I would like to do
2 80.39535 Within past year
3 120.59302 Yes 5 or more years ago I can do some things I would like to do
X_pastae1 X_rfbing5 pcpsaad2 rrphysm2 hlthcvrg nummen X_vegesum
1 Did not meet both guidelines No No 3 7 1 53
2 Did not meet both guidelines No No 2 1 148
3 Did not meet both guidelines Yes No 3 2 191
hadsgco1 exeroft1 X_chldcnt arthwgt hlthpln1 X_ststr stateres
1 Colonoscopy NA No children in household Yes Yes 11081 Yes
2 105 Two children in household Yes 11081 Yes
3 Colonoscopy NA No children in household No Yes 11081 Yes
pcpsadi1 strehab1 arthdis2 X_hcvu651 asdrvist pavig21_
1 Yes Have health care coverage 0 NA
2 Have health care coverage NA 0
3 Yes Have health care coverage NA NA
X_ageg5yr qlstres2 exerany2 X_drdxar1 income2
1 Age 60 to 64 30 No Diagnosed with arthritis Less than $75,000
2 Age 50 to 54 3 Yes Not diagnosed with arthritis $75,000 or more
3 Age 55 to 59 5 No Diagnosed with arthritis $75,000 or more
X_pastrng misrstls chkhemo3
1 Did not meet muscle strengthening recommendations Some NA
2 Did not meet muscle strengthening recommendations None NA
3 Did not meet muscle strengthening recommendations None NA
X_lmtact1
1 Told have arthritis and have limited usual activities
2 Not told they have arthritis
3 Told have arthritis and have limited usual activities
numphon2
1 2 residential telephone numbers
2
3
imfvplac psatest1 X_rawrake
1 1
2 Workplace 2
3 A doctor´s office or health maintenance organization (HMO) 3
X_veg23 painact2 hivtst6 ssbsugar pregnant
1 Included - values are in accepted range 5 No 305
2 Included - values are in accepted range 0 Yes 203
3 Included - values are in accepted range 20 Yes 202
[ reached 'max' / getOption("max.print") -- omitted 491772 rows ]

Not really, please read the guide I gave you.

If you type this

dput(agregated_brfss2013)

in to you r studio and execute it, it should create a result in your console that you can copy and paste into this thread. The result of that dput will allow for others to recreate your agregated_brfss2013 data frame.

After you include that here, then include the rest of your code. After all the code is pasted below, highlight it and click the </> button in the ribbon to format it as code.

I think it would be reproducible then.

Ok, this time read your post. Is this it, then?

dput(brfss2013[1:6, c(19, 59 )])
structure(list(genhlth = structure(c(4L, 3L, 3L, 2L, 3L, 2L), .Label = c("Excellent",
"Very good", "Good", "Fair", "Poor"), class = "factor"), sex = structure(c(2L,
2L, 2L, 2L, 1L, 2L), .Label = c("Male", "Female"), class = "factor")), row.names = c(NA,
6L), class = "data.frame")

I think that helps to produce the data frame named brfss2013 but you need to include the rest of the code you are trying to run and then highlight it and click the </> button above the reply window.

I think we should have in mind that the data is a survey conducted with people from 55 different states (X-state variable)... I don't know how determinant is the rep example for the fial plot, bu the plot shouldn't be representative of Alabama residents only.

brfss2013_semNA <- brfss2013%>%

  • group_by(genhlth, sex)%>%
  • count(na.omit(genhlth, sex))%>%
  • ggplot(aggregated_brfss2013, mapping = aes(x = genhlth,weight=n)) + geom_bar(aes(fill = sex), position = "dodge")Preformatted text``Preformatted text

And then I need some code to adjust for frequencies...

Those bullet points should not be there in the reprex - not sure why those are there.

Brian, what are bullet points?

Is this what you want to do?

library(tidyverse)

# Sample data
brfss2013 <- data.frame(
     genhlth = as.factor(c("Fair","Good","Good",
                           "Very good","Good","Very good")),
         sex = as.factor(c("Female","Female",
                           "Female","Female","Male","Female"))
)

brfss2013 %>%
    drop_na() %>% 
    count(genhlth, sex) %>%
    ggplot(mapping = aes(x = genhlth, y = n)) +
    geom_col(aes(fill = sex), position = "dodge")

Created on 2020-03-13 by the reprex package (v0.3.0.9001)

1 Like

sorry, now I get it..

brfss2013_semNA <- brfss2013%>%
group_by(genhlth, sex)%>%
count(na.omit(genhlth, sex))%>%
ggplot(aggregated_brfss2013, mapping = aes(x = genhlth,weight=n)) + geom_bar(aes(fill = sex), position = "dodge")Preformatted text

Adres, yes, more or less so...

One thing I don't understand is why the line count(genhlth, sex) %>%

What does it do?

I think the name of the function is self-explanatory, it counts things, Maybe this would make it more evident.

library(tidyverse)

# Sample data
brfss2013 <- data.frame(
    genhlth = as.factor(c("Fair","Good","Good",
                          "Very good","Good","Very good")),
    sex = as.factor(c("Female","Female",
                      "Female","Female","Male","Female"))
)

brfss2013 %>%
    drop_na() %>% 
    count(genhlth, sex)
#> # A tibble: 4 x 3
#>   genhlth   sex        n
#>   <fct>     <fct>  <int>
#> 1 Fair      Female     1
#> 2 Good      Female     2
#> 3 Good      Male       1
#> 4 Very good Female     2
1 Like

Sure...and about y = n?
Does n stand for number?

Yes, What another result you would expect from counting things? y = n is mapping the n ("number") variable to the y aesthetic, if you want to learn how to use these tools I recommend you to read this free book.

1 Like

Hi Andres,

Thanks a lot for he reading recommendation. I'm enjoying the book, Also, your coding seems correct, but both my ggplot and tidyverse libraries appears to be corrupted.

library("ggplot2")
Erro: package or namespace load failed for ‘ggplot2’ in get(method, envir = home):
lazy-load database 'C:/Users/Ismael/Documents/R/win-library/3.6/ggplot2/R/ggplot2.rdb' is corrupt
Além disso: Warning messages:
1: package ‘ggplot2’ was built under R version 3.6.3
2: In .registerS3method(fin[i, 1], fin[i, 2], fin[i, 3], fin[i, 4], :
reiniciando promessa interrompida de avaliação
3: In get(method, envir = home) :
reiniciando promessa interrompida de avaliação
4: In get(method, envir = home) : internal error -3 in R_decompress1
library("tidyverse")
Erro: package or namespace load failed for ‘tidyverse’ in get(method, envir = home):
lazy-load database 'C:/Users/Ismael/Documents/R/win-library/3.6/ggplot2/R/ggplot2.rdb' is corrupt
Além disso: Warning messages:
1: package ‘tidyverse’ was built under R version 3.6.3
2: In .registerS3method(fin[i, 1], fin[i, 2], fin[i, 3], fin[i, 4], :
reiniciando promessa interrompida de avaliação
3: In get(method, envir = home) :
reiniciando promessa interrompida de avaliação
4: In get(method, envir = home) : internal error -3 in R_decompress1

Try restarting your R session (Ctrl+Shift+F10) and reinstalling ggplot2

Hello, fixed this problem...

Now I get the following error: rlang::last_error()
<error/rlang_error>
Aesthetics must be either length 1 or the same as the data (491775)

For the code: ggplot(brfss2013, mapping = aes(x = genhlth)) + geom_col(aes(fill = sex), position = "dodge") + scale_fill_manual(values = c("Female" = "springgreen", "Male" = "chocolate"))

Can this be because, when R sets name value pairs aesthetic = variable, if one variable (Female) is larger than the other ("Male"), the program returns an error? Or is it because there are Na 's, which doesn't compute as female or male, and therefore don't add up to the total lenght of data?

If so, how could it be fixed?