Column values are changed if conditions in another column are met

nnguyen · March 22, 2023, 8:30am

I want to set conditions in specific columns that correspond to changed values in other columns
Example:

Duration = c("2h 50m", "7h 25m", "19h", "5h 25m", "4h 45m", "2h 25m", "15h 30m", "21h 5m", "25h 30m", "7h 50m")
Price = c(3897, 7662, 13882, 6218, 13302, 3873, 11087, 22270, 11087, 8625))

What I want is:
Price = c(134, 7662, 13882, 6218, 13302, 134, 11087, 22270, 11087, 8625))

The code I tried is:

# Change value if duration is less than 3 hours
data$Price[data$Duration <= "3h"] <- ceiling(((data$Price) / 114) + 100)

It works, but R changes value regardless of the condition
Here's what I got:
Price = c(135, 7662, 168, 6218, 13302, 222, 155, 217, 134, 8625))
Hope that makes sense

nirgrahamuk · March 22, 2023, 11:29am

you have several issues to address.
issue #1
firstly I dont see how the code you tried could work without throwing an error unless theres an odd circumstance of the data you have; certainly it simply does not work for the example you provided due to the fact that on the right side of the assignment you calculate as many values as there are prices and on the left side you say to put them in a subset. assignment should have a one to one relationship between what is being assigned from to what being assigned to. a structure like this would be expected to 'work'

data$Price[data$Duration <= "3h"] <- ceiling(((data$Price[data$Duration <= "3h"] ) / 114) + 100)

notice that there is subsetting on both sides.
That code would be equivalent to

data$Price <- ifelse(data$Duration<="3h",
                     ceiling(data$Price / 114 + 100),
                     data$Price
                     )

which might be preferred (or not)

I would prefer to use tidyverse approach which reduces the repetion of repeating data$ everywhere

issue #2
you are computing on character strings. unless this c("2h 50m", "7h 25m", stuff is a representation of an actual time class that you really have but you provided us the appearance as string, you cannot rely on accurate order comparisons based on on these sorts of strings, you should use an actual time datatype.

"1h" <= "19h"
FALSE

because comparing strings is done letter by letter. 'h' is considered 'more' than '9' in the second letter

library(hms)
hms(hours = 1) <= hms(hours=19)
#TRUE

here is some example code that takes strings of the form you have and makes them hms types

# Load the hms package
library(hms)
library(readr)
library(stringr)
# Create a list of string times
times_str <- c("2h 50m", "7h 25m", "19h")

# Define a function to convert a string time to an hms time
str_to_hms <- function(time_str) {
  # Split the string into hours and minutes components
  time_components <- strsplit(time_str, "h")[[1]]
  # Extract the hours and minutes components as integers
  hours <- parse_integer(
    str_remove_all(time_components[1], "[^\\d]"))
  minutes <- parse_integer(
    str_remove_all(time_components[2], "[^\\d]"))
  if(is.na(minutes)){
    minutes<-0
  }
  # Create an hms object from the hours and minutes
  hms(hours=hours,minutes=minutes)
}

# Apply the function to the list of string times
times_hms <- lapply(times_str, str_to_hms)

# Print the resulting hms time objects
times_hms

nnguyen · March 22, 2023, 8:50pm

I'll tell you something. It would be better for me to munge the data before I run the code with conditions. I'm still working on conversion times. My codes are used the way I like

c("2h 50m", "7h 25m", "19h", "5h 25m", "4h 45m", "2h 25m", "15h 30m", 
"21h 5m", "25h 30m", "7h 50m", "13h 15m", "2h 35m", "2h 15m", 
"12h 10m", "2h 35m", "26h 35m", "4h 30m", "22h 35m", "23h", "20h 35m", 
"5h 10m", "15h 20m", "2h 50m", "2h 55m", "13h 20m", "15h 10m", 
"5h 45m", "5h 55m", "2h 50m", "2h 15m", "2h 15m", "13h 25m", 
"2h 50m", "22h", "5h 30m", "10h 25m", "5h 15m", "2h 30m", "6h 15m", 
"11h 55m", "11h 5m", "8h 30m", "22h 5m", "2h 45m", "12h", "2h 50m", 
"2h 50m", "2h 15m", "16h 5m", "19h 55m")

The output should look like this: 02:50:00
This is my code:

data$Duration <- str_replace_all(data$Duration, "[^\\d]", ":")
data$Duration <- str_replace_all(data$Duration, "::", ":")
data$Duration <- paste(data$Duration, "00:00")
data$Duration <- str_replace_all(data$Duration, " ", "")
data$Duration <- format(as.POSIXct(data$Duration, format = "%H:%M:%S"), "%H:%M:%S")

c("02:50:00", "07:25:00", "19:00:00", "05:25:00", "04:45:00",
"02:25:00", "15:30:00", "21:05:00", NA, "07:50:00", "13:15:00",
"02:35:00", "02:15:00", "12:10:00", "02:35:00", NA, "04:30:00",
"22:35:00", "23:00:00", "20:35:00", "05:10:00", "15:20:00", "02:50:00",
"02:55:00", "13:20:00", "15:10:00", "05:45:00", "05:55:00", "02:50:00",
"02:15:00", "02:15:00", "13:25:00", "02:50:00", "22:00:00", "05:30:00",
"10:25:00", "05:15:00", "02:30:00", "06:15:00", "11:55:00", "11:05:00",
"08:30:00", "22:05:00", "02:45:00", "12:00:00", "02:50:00", "02:50:00",
"02:15:00", "16:05:00", "19:55:00")

The output is as I would like it, but there are NAs even though I don't see any errors with my codes.

nirgrahamuk · March 22, 2023, 9:32pm

The first NA corresponds to this. Its out of bounds

nnguyen · March 22, 2023, 9:37pm

Okay, so how can I prevent or fix this? Are you suggesting I remove that value?

nirgrahamuk · March 23, 2023, 9:27am

hms() can accept a 25hour time, so go with that approach.

hms(hours = 25)
# 25:00:00

system · April 13, 2023, 9:28am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.