Converting values from Yes/No to 1/0 in Tidyverse


#1

Hi,

I am new here - hope I get the format of my question right! I have some experience in R (but not so much) and am new to the Tidyverse, the benefits of which I get - and want to use.

I have a tibble which has a number of columns which I have been converting (e.g splitting with separate() etc all of which has gone well.

> str(cleaned_df)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	1000 obs. of  21 variables:
 $ checking_status       : int  0 1 1 0 3 1 0 3 0 3 ...
 $ duration              : num  15.8 20 17.8 14.9 23.8 ...
 $ credit_status         : int  2 4 2 2 2 2 2 4 2 2 ...
 $ purpose               : int  2 6 3 9 3 3 9 3 1 6 ...
 $ credit_amount         : num  968 1707 2803 2303 2276 ...
 $ savings_status        : int  4 0 0 0 0 4 4 0 0 2 ...
 $ employment            : int  0 4 2 0 2 4 3 1 2 4 ...
 $ installment_commitment: int  3 4 1 2 2 3 2 4 1 4 ...
 $ personal_status       : int  2 2 2 0 1 2 1 2 2 2 ...
 $ other_parties         : int  0 0 0 0 2 0 0 2 0 0 ...
 $ residence_since       : int  4 4 2 4 4 4 2 3 4 4 ...
 $ property_magnitude    : int  0 2 2 0 0 2 0 1 3 1 ...
 $ age                   : num  69.3 39.1 33.1 33.3 22.6 ...
 $ other_payment_plans   : chr  "stores" "bank" "none" "none" ...
 $ housing               : int  1 0 1 1 1 1 1 1 2 1 ...
 $ existing_credits      : int  1 2 1 1 1 1 1 2 1 1 ...
 $ job                   : int  3 3 3 2 3 1 3 2 2 2 ...
 $ num_dependents        : int  1 1 2 1 1 1 1 1 1 1 ...
 $ own_telephone         : chr  "yes" "none" "none" "none" ...
 $ foreign_worker        : chr  "yes" "no" "yes" "yes" ...
 $ class                 : chr  "bad" "bad" "good" "bad" ...
I would like to convert columns $own_telephone and $foreign_worker from Yes / No (or Yes / None) to integer values of 1 or 0 for each.

What is the ‘best’ method to do this in dplyr (or related package), please?

I have edited the chunk above (trying to follow Mara’s hint - except that having installed the reprex library I now get errors (which seem to be common), so I will have to fix those too.

Thanks

Ian


#2

One option is to use forcats::fct_recode(). (forcats is another tidyverse package, which is also attached if you run `library(tidyverse).

Also, Could you please turn this into a self-contained reprex (short for minimal reproducible example)? It will help us help you if we can be sure we’re all working with/looking at the same stuff. And it save responders the hassle of having to reformat the data by hand.

Thanks


#3

Hi Mara

Thank you.

I have updated the example above

Ian


#4

@mara was referring to a number of things about your code. You should prune down your example to just what is needed to show what you need to do… not whole table. Also you should show the input (reduced to just what is need to explain your problem) and the output you expect.

The reprex @mara referred you to should build the table… your updated question is not a reprex and just shows us the data in the input table, and doesn’t show the output you expect.

Keep in mind that just about everyone in this community is using their own spare time to answer your question so you should do what you can to make it as easy a possible for this community to help you.

In any case here is a reprex of a pruned down example that builds your input data and shows how to use mutate to get the results it seems like you are looking for.

suppressPackageStartupMessages(library(tidyverse))
tbl <- tibble::tribble(
~num_dependents, ~own_telephone, ~foreign_worker,
1, "yes", "yes",
1, "none", "no",
2, "none", "yes")

tbl
#> # A tibble: 3 x 3
#>   num_dependents own_telephone foreign_worker
#>            <dbl> <chr>         <chr>         
#> 1           1.00 yes           yes           
#> 2           1.00 none          no            
#> 3           2.00 none          yes


mutate(tbl, 
            own_telephone = if_else(own_telephone == "yes", 1L, 0L),
            foreign_worker = if_else(foreign_worker == "yes", 1L , 0L))
#> # A tibble: 3 x 3
#>   num_dependents own_telephone foreign_worker
#>            <dbl>         <int>          <int>
#> 1           1.00             1              1
#> 2           1.00             0              0
#> 3           2.00             0              1

Created on 2018-03-04 by the reprex package (v0.2.0).


#5

You could also consider the sjmisc package:

http://www.strengejacke.de/sjPlot/sjmisc-cheatsheet.pdf

More specifically, using the rec function, should be straightforward to do


#6

Dan,

Thank you - that is immensely helpful.

As I said in my edited message, I have installed reprex() and attempted to use it for the first time, but it runs with an error:

Unable to put result on the clipboard. How to get it:

This is a known issue which I am attempting to overcome.

I really appreciate that people give up their own time to help others here.

Please also bear in mind that this is my first time here, and I will learn the standards for posing questions, and get better, with your help.

Thanks again

Ian


#7

We are all waiting for CRAN to catch up to the latest version. You can install the latest version from github with

devtools::install_github("tidyverse/reprex")

Sorry for a roundabout way to get reprex install. Eventually CRAN will catch up.