Issue with a function

Hello everyone,

I have an issue with my code (see below), and I really can't understand where it comes from and how to solve it.

## Select the parent directory of the distribution directories (as the input folder) and the output folder
input_folder <- list.files(tk_choose.dir(), full.names = TRUE)
output_folder <- tk_choose.dir()

## FOR LDF FILES: Define a function to process each file and return the processed data frame
regression_file <- function(LDF_file, folder_name) {
  LDF_contents <- read.csv(file = LDF_file, header = FALSE)[-1,]
  
  basename <- basename(LDF_file)
  name_parts <- strsplit(basename, "-")[[1]]
  assign("name_parts", name_parts, envir = .GlobalEnv)
  
  Count_Amount <- paste(name_parts[3])
  Rectangle_LDF <- name_parts[ifelse(is.numeric(as.numeric(name_parts[1])), 1, 2)]
  year <- name_parts[ifelse(is.numeric(as.numeric(name_parts[1])), 2, 1)]
  
  assign("count/amount", Count_Amount, envir = .GlobalEnv)
  assign("Rectangle/LDF", Rectangle_LDF, envir = .GlobalEnv)
  assign("year", year, envir = .GlobalEnv)

  test1 <- ifelse(is.numeric(as.numeric(name_parts[1])), "Hello", "Bye")
  assign("test1", test1, envir = .GlobalEnv)

  #Create the columns of the regression file
  Accident_Year <- rep(LDF_contents[,1], 39)
  f <- rep(1:40, 39)
  Period <- rep(seq(from = 12, to = 240, by = 6), each = 40)
  LDF <- as.numeric(unlist(LDF_contents[,-1]))
  Coverage_name <- paste(name_parts[4])
  Type <- paste(name_parts[5])
  Type <- gsub(".csv", "", Type)
  
  Regression <- data.frame(Accident_Year, f, Period, LDF, Coverage_name, Type)
  
  # Add a new column for distribution
  Regression$Distribution <- folder_name
  
  return(Regression)
}

## Create an empty list to store the merged data frames for each folder (to improve performance)
Merged_data <- vector("list", length(input_folder))

## Iterate over each input_folder, merge the files within each folder, and store the merged data frames
Merged_data <- lapply(input_folder, function(folder) {
  folder_name <- basename(folder)
  
  files <- list.files(folder, full.names = TRUE)
  data_frames <- lapply(files, regression_file, folder_name = folder_name)
  merged_data <- bind_rows(data_frames)
})

All the code above works perfectly except for one single part:

  test1 <- ifelse(is.numeric(as.numeric(name_parts[1])), "Hello", "Bye")
  assign("test1", test1, envir = .GlobalEnv)

The name_parts variable is simply the base name of a file that's located in the input_folder that the user can choose. Depending on the input folder chosen, the file name will either start with a number or a character string. Whenever it starts with a number, I would like the variable test1 to take the value "Hello", and "Bye" otherwise. This is very similar to the two lines below (which work perfectly):

Rectangle_LDF <- name_parts[ifelse(is.numeric(as.numeric(name_parts[1])), 1, 2)]
  year <- name_parts[ifelse(is.numeric(as.numeric(name_parts[1])), 2, 1)]

Except that here, I do not want to assign a value that's already present in the 'name_parts' variable but just the values I stipulated above. But for some reason, whenever I try to assign the values for test1, the program keeps assigning "Hello", even if the file name starts with a character string.

Does anyone have a solution? I can provide more information if needed.

Thank you!

To detect if a string starts with a number, 'as.numeric' is not appropriate.

grepl(name_parts[1],pattern = "\\d+")

will give you TRUE or FALSE as to whether name_parts[1] starts with a digit ; unlike is.numeric(as.numeric(...))

Perfect! I didn't know that. Thanks a lot!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.