I am quite new to R. However, I am doing an assignment and am wondering the most efficient way to carry out the following task:
Like the title of this post says, I need to find the columns in a dataframe that are string values, and then change all the values with NULL in those columns to 'Data Not Available'.
I could do this manually, by first using sapply(df,class). And then I could identify which columns are strings, but I see this as an opportunity to find a scalable solution incase I ever have a dataframe that may have dozens of columns (I've seen it!)
A data frame would have NA to indicate missingness
mtcars[1,1] <- NA
mtcars[1,1:5]
#> mpg cyl disp hp drat
#> Mazda RX4 NA 6 160 110 3.9
Here's a way
d <- data.frame(postal = state.abb,
state = state.name,
id = 1:50)
d[4,] <- NA
head(d)
#> postal state id
#> 1 AL Alabama 1
#> 2 AK Alaska 2
#> 3 AZ Arizona 3
#> 4 <NA> <NA> NA
#> 5 CA California 5
#> 6 CO Colorado 6
find_char_var <- function(x,y) is.character(x[y][[1]])
ruin_data <- function(x,y) ifelse(is.na(x[y][[1]]),"Data Not Available",x[y][[1]])
the_chars <- vector()
for(i in seq_along(d)) the_chars[i] = find_char_var(d,i)
the_chars
#> [1] TRUE TRUE FALSE
for(i in which(the_chars == TRUE)) d[i] = ruin_data(d,i)
d |> head()
#> postal state id
#> 1 AL Alabama 1
#> 2 AK Alaska 2
#> 3 AZ Arizona 3
#> 4 Data Not Available Data Not Available NA
#> 5 CA California 5
#> 6 CO Colorado 6
But changing the data frame is not the right way to go because changing NA to a string loses information that can be used to advantage. Instead, wait until it is time for output and then pipe in the transformation along with descriptive column headers and other features intended for human aid.