I am working with a wide dataset and want to convert it to a long dataset with Pivot_Longer. The original dataset has 1000 plus columns with different ICD10 codes, I want to conver them into a long format and call the column "diag". The value then will be 1 if the original field has value, else 0. I was successful till converting the table to long, but it still has the actual ICD10 codes rather than 1 or 0 value. How can I create a new field dummy = 0 or 1, depending on the column "sum":
if sum is blank, then 0, else 1??
longdata<- pivot_longer(iris4, cols = starts_with("ICD"), names_to = "Diag", values_to = "sum")
longdata$dummy <- 1
str(longdata)
replace(longdata$dummy, longdata$sum == "", 0 )
X2nd.MD.ID Gender Age Zip Date.of.Service Diag sum
1 196602 M 0.0583 60301 12/10/2019 ICD_1 "P034"
2 196602 M 0.0583 60301 12/10/2019 ICD_2 "Q5310"
3 196602 M 0.0583 60301 12/10/2019 ICD_3 "Z23"
4 196602 M 0.0583 60301 12/10/2019 ICD_4 "Z3801"
5 196602 M 0.0583 60301 12/10/2019 ICD_5 ""
6 196602 M 0.0583 60301 12/10/2019 ICD_6 ""
7 196602 M 0.0583 60301 12/10/2019 ICD_7 ""
8 196602 M 0.0583 60301 12/10/2019 ICD_8 ""
9 196602 M 0.0583 60301 12/10/2019 ICD_9 ""
10 196602 M 0.0583 60301 12/10/2019 ICD_10 ""
I would use nchar
:
library(tibble)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
library(magrittr)
#>
#> Attaching package: 'magrittr'
#> The following object is masked from 'package:tidyr':
#>
#> extract
df1 <- tibble::tribble(
~X2nd.MD.ID, ~Gender, ~Age, ~Zip, ~Date.of.Service, ~ICD_1, ~ICD_2, ~ICD_3,
196602, "M", 0.0583, 60301, 12/10/2019, "P034" ,"Q5310" , "",
196603, "F", 0.0583, 60301, 12/10/2019, "" ,"Q5310" , "P035"
)
print(df1)
#> # A tibble: 2 x 8
#> X2nd.MD.ID Gender Age Zip Date.of.Service ICD_1 ICD_2 ICD_3
#> <dbl> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr>
#> 1 196602 M 0.0583 60301 0.000594 "P034" Q5310 ""
#> 2 196603 F 0.0583 60301 0.000594 "" Q5310 "P035"
longdata <- tidyr::pivot_longer(df1, cols = starts_with("ICD"),
names_to = "Diag", values_to = "sum") %T>%
print()
#> # A tibble: 6 x 7
#> X2nd.MD.ID Gender Age Zip Date.of.Service Diag sum
#> <dbl> <chr> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 196602 M 0.0583 60301 0.000594 ICD_1 "P034"
#> 2 196602 M 0.0583 60301 0.000594 ICD_2 "Q5310"
#> 3 196602 M 0.0583 60301 0.000594 ICD_3 ""
#> 4 196603 F 0.0583 60301 0.000594 ICD_1 ""
#> 5 196603 F 0.0583 60301 0.000594 ICD_2 "Q5310"
#> 6 196603 F 0.0583 60301 0.000594 ICD_3 "P035"
longdata %>%
mutate(sum=ifelse(nchar(sum)>0,1,0))
#> # A tibble: 6 x 7
#> X2nd.MD.ID Gender Age Zip Date.of.Service Diag sum
#> <dbl> <chr> <dbl> <dbl> <dbl> <chr> <dbl>
#> 1 196602 M 0.0583 60301 0.000594 ICD_1 1
#> 2 196602 M 0.0583 60301 0.000594 ICD_2 1
#> 3 196602 M 0.0583 60301 0.000594 ICD_3 0
#> 4 196603 F 0.0583 60301 0.000594 ICD_1 0
#> 5 196603 F 0.0583 60301 0.000594 ICD_2 1
#> 6 196603 F 0.0583 60301 0.000594 ICD_3 1
longdata %>%
filter(nchar(sum)>0) %>%
rename(type=sum)
#> # A tibble: 4 x 7
#> X2nd.MD.ID Gender Age Zip Date.of.Service Diag type
#> <dbl> <chr> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 196602 M 0.0583 60301 0.000594 ICD_1 P034
#> 2 196602 M 0.0583 60301 0.000594 ICD_2 Q5310
#> 3 196603 F 0.0583 60301 0.000594 ICD_2 Q5310
#> 4 196603 F 0.0583 60301 0.000594 ICD_3 P035
Created on 2020-08-01 by the reprex package (v0.3.0)
1 Like
Thank you! So I see you are using slightly different code, this works the same!
system
Closed
August 22, 2020, 10:55pm
4
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.