Create new variables depend specific columns and add prefix.

Hi community,

Im have a data set with many variables (50 variables). Im put a short example.

Im try to find a way to create a news variables depend some other, but not all variables and add the prefix NEW_ Because is a process for clean up and update data.

New variables:

*New_PAIS_ORIGEN
*New_LATITUD
*New_ALTITUD

and in this way for many others variables

Maybe by loop or other way.

datos<- structure(list(ACCESION = c("G123", "G345", "G567", "G342"), 
    ESTADO = c("AV", "NAV", "AV", "NAV"), PAIS_ORIGEN = c("USA", 
    "COL", "BRA", "PRU"), LATITUD = c("42.02 ", "40.02", "41.05", 
    "39.45"), ALTITUD = c("20.34", "21.64", "40.84", "50.34")), class = "data.frame", row.names = c(NA, 
-4L))

Tnks!

Is this what you mean?

library(dplyr)

datos <- structure(list(ACCESION = c("G123", "G345", "G567", "G342"), 
                       ESTADO = c("AV", "NAV", "AV", "NAV"), PAIS_ORIGEN = c("USA", 
                                                                             "COL", "BRA", "PRU"), LATITUD = c("42.02 ", "40.02", "41.05", 
                                                                                                               "39.45"), ALTITUD = c("20.34", "21.64", "40.84", "50.34")), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                               -4L))

datos %>% 
    mutate(across(.cols = c(PAIS_ORIGEN, LATITUD, ALTITUD),
                  .fns = ~ ., # Any data processing should go here
                  .names = "NEW_{.col}")
           )
#>   ACCESION ESTADO PAIS_ORIGEN LATITUD ALTITUD NEW_PAIS_ORIGEN NEW_LATITUD
#> 1     G123     AV         USA  42.02    20.34             USA      42.02 
#> 2     G345    NAV         COL   40.02   21.64             COL       40.02
#> 3     G567     AV         BRA   41.05   40.84             BRA       41.05
#> 4     G342    NAV         PRU   39.45   50.34             PRU       39.45
#>   NEW_ALTITUD
#> 1       20.34
#> 2       21.64
#> 3       40.84
#> 4       50.34

Created on 2022-08-22 by the reprex package (v2.0.1)

2 Likes

Is a good advance.

But I have many variables, and for make more efficient, Im find this order in the new variables.

Like this picture:

When the new variable appear next to the original variable:

PAIS_ORIGEN ... NEW_PAIS_ORIGEN ... DEPARTAMENTO ... NEW_DEPARTAMENTO

Im run the code but the new variables appear with data, the idea is create a new variables in white spaces for make the fixes in this space.

  ACCESION ESTADO PAIS_ORIGEN LATITUD ALTITUD NEW_PAIS_ORIGEN NEW_LATITUD NEW_ALTITUD
1     G123     AV         USA  42.02    20.34             USA      42.02        20.34
2     G345    NAV         COL   40.02   21.64             COL       40.02       21.64
3     G567     AV         BRA   41.05   40.84             BRA       41.05       40.84
4     G342    NAV         PRU   39.45   50.34             PRU       39.45       50.34

So you are not trying to make calculations programmatically but to put together a printable table that you can manually fill? If that is the case I don't see a pattern that could allow to specify the column order more efficiently, I think you are going to need to manually specify your desired order or maybe use the word "new" as a sufixe instead of a prefixe, then if you reorder alphabetically the columns will remain together.

Thanks for your appreciation.

The idea is to be able to build the new variables blank and next to the original to facilitate corrections. It would be just a virtual format to enter the data.

To avoid doing that order of columns manually when there are many.

On the other hand, using your code the new columns do appear, but they keep the data, the idea is that these new columns are blank.

I am trying to make a general process to apply it on different databases.

I was just giving a general example, hence the clarification comment

If you want a blank cell for let's say exporting an excel file, you can simply do this

library(dplyr)

datos <- structure(list(ACCESION = c("G123", "G345", "G567", "G342"), 
                        ESTADO = c("AV", "NAV", "AV", "NAV"), PAIS_ORIGEN = c("USA", 
                                                                              "COL", "BRA", "PRU"), LATITUD = c("42.02 ", "40.02", "41.05", 
                                                                                                                "39.45"), ALTITUD = c("20.34", "21.64", "40.84", "50.34")), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                -4L))

datos %>% 
    mutate(across(.cols = c(PAIS_ORIGEN, LATITUD, ALTITUD),
                  .fns = ~ "",
                  .names = "{.col}_NEW")
    ) %>% 
    select(sort(tidyselect::peek_vars()))
#>   ACCESION ALTITUD ALTITUD_NEW ESTADO LATITUD LATITUD_NEW PAIS_ORIGEN
#> 1     G123   20.34                 AV  42.02                      USA
#> 2     G345   21.64                NAV   40.02                     COL
#> 3     G567   40.84                 AV   41.05                     BRA
#> 4     G342   50.34                NAV   39.45                     PRU
#>   PAIS_ORIGEN_NEW
#> 1                
#> 2                
#> 3                
#> 4

Created on 2022-08-23 with reprex v2.0.2

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.