Removing "NAs" but keeping variable labels

I have 4 similar datasets that I need to analyse and I am trying to remove "NA" values from three of my 160+ variables. I have tried two popular methods "is.na" and "drop_na" and both work BUT they change a few things in my data.

  1. The variable labels disappear from the new dataset as shown in the picture. Is there a way of removing the 'NA's without removing the labels?
  2. The 'new' dataset in global environment doesn't appear with the blue arrow anymore. Not sure if this is of any importance but it bothers me that I may be doing something wrong. What is the difference between these two types of datasets (e.g. 'men' and 'pre') pertaining to the blue arrow as shown in the pictures

If I need to clarify ore, please let me know.

Posting screenshots is not very useful since it makes it impossible to copy-paste and follow along on what you are doing. So to help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

the reprex is attached below. Not sure why the drop_na function is giving an error, I tested it for this dataframe and it works.

tibble::tribble(
    ~Fer, ~sTfR, ~RBP,  ~CRP, ~AGP, ~Zn_gdl, ~MCLUSTER, ~MNUMBER, ~M01, ~MBARCODE,
      NA,    NA,   NA,    NA,   NA,      NA,       829,      167,    1,      1599,
      NA,    NA,   NA,    NA,   NA,      NA,       227,       63,    9,       827,
      NA,    NA,   NA,    NA,   NA,      NA,       822,       73,    2,      1075,
      NA,    NA,   NA,    NA,   NA,      NA,       145,      123,    1,      4037,
      NA,    NA,   NA,    NA,   NA,      NA,       777,      256,    1,      1451,
      NA,    NA,   NA,    NA,   NA,      75,       777,      143,    2,      1465,
      NA,    NA,   NA,    NA,   NA,      NA,       566,      237,    1,      1891,
      NA,    NA,   NA,    NA,   NA,      NA,       566,      237,    3,      1891,
      NA,    NA,   NA,    NA,   NA,      NA,       321,       85,    1,      2087,
  124.17,  4.83, 1.45,  4.97, 0.65,  59.375,       601,      289,    2,      1647,
  169.42,  5.26, 1.36, 19.23, 1.56,   56.25,       601,      151,    2,      1649,
   43.06,  8.47, 2.01,  1.19, 0.61,      50,       601,        9,    1,      1650,
  250.44,  5.19,  0.9, 13.29, 1.07,  53.125,       511,       94,    2,      1607,
  224.39,  5.78, 2.37,  5.03,  0.8,   93.75,       511,       95,    1,      1601,
  199.83,  3.61, 1.69,  0.84, 0.44,   68.75,       829,      234,    1,      1584,
   85.42,  3.56, 1.51,  0.76, 0.34,   56.25,       270,      148,    1,       627,
  166.57,  4.86, 2.31,  0.03, 0.66,   68.75,       240,       82,    1,      1492,
   80.63,  7.91, 2.17,  1.08, 0.87,    62.5,       240,      241,    1,      1496,
  215.53,  5.25, 2.01,  0.38,  0.7,  65.625,       240,      295,    6,      1483,
  136.57,  5.57, 1.89,  0.74, 0.62,     200,       240,      295,    1,      1483
  )
#> # A tibble: 20 x 10
#>      Fer  sTfR   RBP   CRP   AGP Zn_gdl MCLUSTER MNUMBER   M01 MBARCODE
#>    <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>   <dbl> <dbl>    <dbl>
#>  1  NA   NA    NA    NA    NA      NA        829     167     1     1599
#>  2  NA   NA    NA    NA    NA      NA        227      63     9      827
#>  3  NA   NA    NA    NA    NA      NA        822      73     2     1075
#>  4  NA   NA    NA    NA    NA      NA        145     123     1     4037
#>  5  NA   NA    NA    NA    NA      NA        777     256     1     1451
#>  6  NA   NA    NA    NA    NA      75        777     143     2     1465
#>  7  NA   NA    NA    NA    NA      NA        566     237     1     1891
#>  8  NA   NA    NA    NA    NA      NA        566     237     3     1891
#>  9  NA   NA    NA    NA    NA      NA        321      85     1     2087
#> 10 124.   4.83  1.45  4.97  0.65   59.4      601     289     2     1647
#> 11 169.   5.26  1.36 19.2   1.56   56.2      601     151     2     1649
#> 12  43.1  8.47  2.01  1.19  0.61   50        601       9     1     1650
#> 13 250.   5.19  0.9  13.3   1.07   53.1      511      94     2     1607
#> 14 224.   5.78  2.37  5.03  0.8    93.8      511      95     1     1601
#> 15 200.   3.61  1.69  0.84  0.44   68.8      829     234     1     1584
#> 16  85.4  3.56  1.51  0.76  0.34   56.2      270     148     1      627
#> 17 167.   4.86  2.31  0.03  0.66   68.8      240      82     1     1492
#> 18  80.6  7.91  2.17  1.08  0.87   62.5      240     241     1     1496
#> 19 216.   5.25  2.01  0.38  0.7    65.6      240     295     6     1483
#> 20 137.   5.57  1.89  0.74  0.62  200        240     295     1     1483

install.packages("tidyverse")
#> Installing package into 'C:/Users/SPECTRE/Documents/R/win-library/3.6'
#> (as 'lib' is unspecified)
#> package 'tidyverse' successfully unpacked and MD5 sums checked
#> 
#> The downloaded binary packages are in
#>  C:\Users\SPECTRE\AppData\Local\Temp\RtmpsvccUD\downloaded_packages

library(tidyr)

try<-try%>% drop_na(Zn_gdl,CRP,AGP)
#> Error in UseMethod("drop_na_"): no applicable method for 'drop_na_' applied to an object of class "function"

I'm not able to recreate your particular issue. When I run drop_na, the result I get back still has the labels. But I am admittedly using some archaic versions of R and I won't tell you how old my package library is.

But I am sympathetic to the loss of labels. There are definitely things you can do to a data frame that will cause loss of the label attribute, and I doubt it's practical to name them all. Personally, I find it easier to restore labels instead of fight to retain them. To do this, I use a package I wrote called labelVector (sorry for the shameless plug). It's a work in progress, and for the code below, you'll need to install the development version with devtools::install_github("nutterb/labelVector", ref = "devel")

library(tibble)
library(tidyr)
library(labelVector)

DFrame <- tibble::tribble(
  ~Fer, ~sTfR, ~RBP,  ~CRP, ~AGP, ~Zn_gdl, ~MCLUSTER, ~MNUMBER, ~M01, ~MBARCODE,
  NA,    NA,   NA,    NA,   NA,      NA,       829,      167,    1,      1599,
  NA,    NA,   NA,    NA,   NA,      NA,       227,       63,    9,       827,
  NA,    NA,   NA,    NA,   NA,      NA,       822,       73,    2,      1075,
  NA,    NA,   NA,    NA,   NA,      NA,       145,      123,    1,      4037,
  NA,    NA,   NA,    NA,   NA,      NA,       777,      256,    1,      1451,
  NA,    NA,   NA,    NA,   NA,      75,       777,      143,    2,      1465,
  NA,    NA,   NA,    NA,   NA,      NA,       566,      237,    1,      1891,
  NA,    NA,   NA,    NA,   NA,      NA,       566,      237,    3,      1891,
  NA,    NA,   NA,    NA,   NA,      NA,       321,       85,    1,      2087,
  124.17,  4.83, 1.45,  4.97, 0.65,  59.375,       601,      289,    2,      1647
) %>% 
  set_label(Fer = "Serum ferritin",
            sTfR = "Soluble Transferrin Receptor",
            RBP = "Retinol Binding Protein",
            CRP = "C-Reactive Protein",
            AGP = "Alpha-1-Acid Glycoprotein",
            Zn_gdl = "Serum zinc",
            MCLUSTER  = "Cluster number",
            MNUMBER = "I don't know what this is",
            M01 = "This wasn't visible in your screen shot",
            MBARCODE = "Probably a bar code of some sort")

From here you can pull out the labels and store them separately in case you need to restore them.

orig_label <- get_label(DFrame, return_vector = FALSE)

Then, any time you need to restore the labels, it's as simple as another call to set_label

DFrame %>% 
  drop_na(Zn_gdl,CRP,AGP) %>% 
  set_label(.dots = orig_label)

Thanks for the response. I will try your method of saving the labels but it would still require typing them all in the code I see. It would be lengthy a task for 160 variables. I will try it on a smaller dataset first but if you have any other suggestions on saving labels without having to manually input them all at some point, please share.

I don't think it would. I typed in the labels because the example data you gave didn't have them. (maybe they would have if you had used dput to share your example data).

The image in your initial posting indicates you brought the data in using read_sav, and I'm guessing that read in the labels. If that is true, you should be able to skip to the get_label(data, return_vector = FALSE) part and proceed from there.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.