Changing data frame values using mapping table

Hi,

I want to use a mapping table, that would help to change cell values for each variable in a structured and systematic way for a whole data frame.

My mapping table looks like this:

mapping_df <- data.frame(
  VarName = c("Var1", "Var1", "Var1")
  , RangeFrom  = c("A", "B", "b")
  , RangeTo    = c("A", "B", "b")
  , Label   = c("A", "B", "B"))

A column from the main data frame

df_initial <- data.frame( Var1 = c("A", "B", "b", "A", "B", "b"))

The end result should look like this

df_final <- data.frame( Var1 = c("A", "B", "B", "A", "B", "B"))

The actual dataframe could be up to 15mil rows and 100 columns. Hence the mapping should be an efficient process.

I created the tables above with character columns in mind, but there will be ranges for numeric values. Perhaps keep the numeric mapping on a side for now.

Thanks

How are you thinking about Range from and to when you arent processing numbers but character strings ?

i want to make the code work with character values for now, but later adapt the code to work with numeric values. If a value falls between the range, assign the value

I would start with this

(mapping_df <- data.frame(
  VarName = c("Var1", "Var1", "Var1")
  , RangeFrom  = c("A", "B", "b")
  , RangeTo    = c("A", "B", "b")
  , Label   = c("A", "B", "B")))
nrow_mapping_df <- nrow(mapping_df)

(df_initial <- data.frame( Var1 = c("A", "B", "b", "A", "B", "b")))
library(tidyverse)

df_initial_longer <- map_dfr(1:1000000,~df_initial)
# make some bigger initial example data

myfunc_inner <- function(x){
  if(length(x)>1)
    stop("1 at a time please")
  xr <- rep(x,nrow_mapping_df)
  hit <- which(between(x = xr,
                              left = mapping_df$RangeFrom,
                              right= mapping_df$RangeTo))
  
  result <- x
  if(length(hit)>1){
    stop("Hit two rules; what am I expected to do ? ")
  }
  if(length(hit)>0)
    result <- mapping_df$Label[[hit]]
  result
}

final <- distinct(df_initial_longer) |>
  rowwise() |>
  mutate(Var1x=myfunc_inner(Var1)) |> 
  right_join(mutate(df_initial_longer,
                    rn=row_number()),
             multiple="all") |> 
  arrange(rn) |> 
  select(-rn,-Var1,Var1=Var1x)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.