Find the dtype STRING columns, and then change their nulls to 'Data Not Available'

Jonathan_Shmulovich · January 25, 2023, 5:49am

Hello,

I am quite new to R. However, I am doing an assignment and am wondering the most efficient way to carry out the following task:

Like the title of this post says, I need to find the columns in a dataframe that are string values, and then change all the values with NULL in those columns to 'Data Not Available'.

I could do this manually, by first using sapply(df,class). And then I could identify which columns are strings, but I see this as an opportunity to find a scalable solution incase I ever have a dataframe that may have dozens of columns (I've seen it!)

Thanks so much all!

technocrat · January 25, 2023, 10:33am

NULL is nothing; it won't be in a data frame.

mtcars[1,1] <- NULL
#> Error in x[[jj]][iseq] <- vjj: replacement has length zero

^{Created on 2023-01-25 with reprex v2.0.2}

A data frame would have NA to indicate missingness

mtcars[1,1] <- NA
mtcars[1,1:5]
#>           mpg cyl disp  hp drat
#> Mazda RX4  NA   6  160 110  3.9

Here's a way

d <- data.frame(postal = state.abb, 
                state = state.name,
                id = 1:50)
d[4,] <- NA
head(d)
#>   postal      state id
#> 1     AL    Alabama  1
#> 2     AK     Alaska  2
#> 3     AZ    Arizona  3
#> 4   <NA>       <NA> NA
#> 5     CA California  5
#> 6     CO   Colorado  6

find_char_var <- function(x,y) is.character(x[y][[1]])
ruin_data <- function(x,y) ifelse(is.na(x[y][[1]]),"Data Not Available",x[y][[1]])
the_chars <- vector()
for(i in seq_along(d)) the_chars[i] = find_char_var(d,i)
the_chars
#> [1]  TRUE  TRUE FALSE
for(i in which(the_chars == TRUE)) d[i] = ruin_data(d,i)
d |> head()
#>               postal              state id
#> 1                 AL            Alabama  1
#> 2                 AK             Alaska  2
#> 3                 AZ            Arizona  3
#> 4 Data Not Available Data Not Available NA
#> 5                 CA         California  5
#> 6                 CO           Colorado  6

^{Created on 2023-01-25 with reprex v2.0.2}

But changing the data frame is not the right way to go because changing NA to a string loses information that can be used to advantage. Instead, wait until it is time for output and then pipe in the transformation along with descriptive column headers and other features intended for human aid.

Jonathan_Shmulovich · January 28, 2023, 1:40am

Thanks a ton! Wow there's a lot to learn here. I will study for sure.

system · February 4, 2023, 1:41am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.