Problem with mutate using referencing value of another data.frame

I have 2 data.frames, one is a base (test_base ) in which I am working and the other is the changes to be made in the base (obs_changes).

Here are the codes of the 2 bases so you can look at it:

test_base<- tibble(id= 1:25,

name= as.factor(sample(c("Celeste", "Fernando", "Juan", "Lucia", "Maria", "Pablo"), 25, replace= T)),

class= as.factor(sample(c("A", "B", "C", "D"), 25, replace= T)),

note= rnorm(25, mean= 6, sd=1),

country= as.factor(sample(c("Argentina", "France", "USA"), 25, replace= T)))

head(test_base)

id name class note country

1 1 Fernando D 6.63 Argentina

2 2 Pablo D 6.79 Argentina

3 3 Juan B 4.61 Argentina

4 4 Celeste B 6.30 France

5 5 Celeste D 7.25 France

6 6 Celeste D 7.27 Argentina


obs_changes

structure(list(date = structure(1647820800, tzone = "UTC", class = c("POSIXct",

"POSIXt")), base_archive = "test_base ", n_rows = 25, n_cols = 5,

id = 5, variable = "name", previous_value = "Fernando", new_value = "Lucas"), row.names = c(NA, -1L), class = c("tbl_df",

"tbl", "data.frame")) 


view(obs_changes)

date   base_archive   n_rows   n_cols id variable previous_value new_value

<dttm> <chr>    <dbl>      <dbl>   <dbl>  <chr>          <chr>          <chr>

2022-03-21 "test_base " 25      5      5   name     Fernando        Lucas

obs_changes$variable is the name of the column of base_test that I need to modify.

> obs_changes$variable

[1] "name"

base_test has a column named "name" and i need to modify one of the values

I need to make changes to the base but following the values ​​that appear in obs_changes. The problem appears when I want to apply mutate() on the base and it doesn't let me, I don't understand why.

For example, when I do

test_base %>%

mutate(obs_changes$variable= 2)

throws me an error

Error: unexpected '=' in: " test_base %>%

mutate(obs_changes$variable="

But the strange thing is that in other syntaxes there is no problem, for example:

base_test[, obs_changes$variable]

brings me without problems the column

Is your goal to replace every instance of Fernando with Lucas or to just make the change at the row where id == 5?

Only where id==5. Ideally, if the mutate worked for me, I would do this:

test_base %>%
    mutate(obs_changes$variable = if_else(id == obs_changes$id & obs_changes$variable == obs_changes$previous_value,
                                          obs_changes$new_value,
                                          obs_changes$variable))

I worked at this for a while but I can't get the mutate function to work using names from obs_changes. I do not understand the non-standard evaluation used in mutate well enough to work around it. Perhaps someone more skill than I am can find a solution.

So bad. But thanks a lot for trying!

mutate() is not meant for this kind of operation and you are mixing base R with dplyr syntax so it is not clear to me exactly what is what you are trying to accomplish but maybe this points you in the right direction.
This code updates the name field based on the obs_changes data frame

library(tidyverse)

test_base <- tibble(id= 1:25,
                   
                   name= as.factor(sample(c("Celeste", "Fernando", "Juan", "Lucia", "Maria", "Pablo"), 25, replace= T)),
                   
                   class= as.factor(sample(c("A", "B", "C", "D"), 25, replace= T)),
                   
                   note= rnorm(25, mean= 6, sd=1),
                   
                   country= as.factor(sample(c("Argentina", "France", "USA"), 25, replace= T)))

obs_changes <- structure(list(date = structure(1647820800, tzone = "UTC", class = c("POSIXct",
                                                                     
                                                                     "POSIXt")), base_archive = "test_base ", n_rows = 25, n_cols = 5,
               
               id = 5, variable = "name", previous_value = "Fernando", new_value = "Lucas"), row.names = c(NA, -1L), class = c("tbl_df",
                                                                                                                               
                                                                                                                               "tbl", "data.frame")) 
test_base %>%
    mutate(name = as.character(name)) %>% 
    rows_update(obs_changes %>% 
                    select(id, variable, new_value) %>%
                    pivot_wider(id_cols = id,
                                names_from = variable,
                                values_from = new_value) %>% 
                    mutate(id = as.integer(id)),
                by = c("id"))
#> # A tibble: 25 × 5
#>       id name     class  note country  
#>    <int> <chr>    <fct> <dbl> <fct>    
#>  1     1 Lucia    C      8.09 USA      
#>  2     2 Celeste  D      6.03 France   
#>  3     3 Fernando D      5.84 USA      
#>  4     4 Celeste  B      5.32 France   
#>  5     5 Lucas    C      6.11 USA      
#>  6     6 Pablo    A      6.44 Argentina
#>  7     7 Fernando A      5.07 France   
#>  8     8 Fernando B      5.91 France   
#>  9     9 Pablo    C      5.68 USA      
#> 10    10 Celeste  C      5.07 USA      
#> # … with 15 more rows

Created on 2022-03-24 by the reprex package (v2.0.1)

Hello, thank you for the answer!
The problem is that in the row mutate(nombre = as.character(nombre)) %>% I cannot specify the name of the column "name" in any other way than referencing it to obs_changes$variable. The reason why this is is because I'm inside a loop, so I won't always have to change the "name" columnbut might have to change another column, so I always need to reference it to obs_changes.
If I don't run that line but if the others, being like this:

base_prueba %>%
     rows_update(obs_cambios %>% 
                   select(id, variable, valor_nuevo) %>%
                   pivot_wider(id_cols = id,
                               names_from = variable,
                               values_from = valor_nuevo) %>% 
                   mutate(id = as.integer(id)),
                 by = c("id"))

It throws me this error:

Error: Assigned data `y` must be compatible with existing data.
i Error occurred for column `nombre`.
x Can't convert from <character> to <factor<52220>> due to loss of generality.
* Locations: 1.
Run `rlang::last_error()` to see where the error occurred.

Which is the same error that I got when I wanted to solve this problem with my first option:

test_base[obs_changes$id, obs_changes$variable]<- obs_changes$new_value

You get that error because the factor levels do not match. I think as long as you define variables in the target data frame (test_base) as 'factors' this problem is unavoidable unless you know all possible levels in advance which will defeat the purpose of making the field updatable in the first place. I think you need to check your database design to account for update ability.

In SQL terms this would be the equivalent of trying to add values to an enum class variable using an INSERT operation, it is simply not possible, you have to enumerate all possible values in advance and modifications are not seamless.

How do you think it can be solved? The idea is to create a loop where the changes that appear in obs_changes are applied to the base test_base. That is why I cannot specify, because tomorrow the data will be different, and instead of changing the name Fernando to Lucas in id=5 I will have to change the country Argentina to Usa in id=8

As I said, I would modify the definition of test_base to allow for a seamless update, I would definitely avoid the use of factors, if you want to standardize data input to avoid typos consider using relational tables linked by an index. I'm basing my advice more on database design considerations rather than programming-language specific ones.

Also, the loop approach seems unnecessary, using rows_update() you can update any existing field based on the new data available.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.