Can we use R to automatize the manual excel process ?

AbhishekHP · September 20, 2019, 7:39am

Hey Community,

There are so many macros that we do via excel but
can we automatize using R ?.

Am not sure what it takes as am not an R expert yet.
Functions like gather, spread or reshape wouldn't make much sense...
probably stringr and regrex might plan a role but not proficient enough to try.

Basically,

We got to remove the ID's such as beet_root, apple from the "list" column (usually, starts with alphabet's except few known exceptions such as ot_p). Ofcourse, we could delete that row completely as well.
Create new column "names" and then append corresponding ID's
for e.g. orange ID for list value: 9734,75R4 and not 123/R90 which is corresponding to grapes ID

Please find the reprex (simplified excel process)

library(tidyverse)

# Actual data 
input = data.frame("list" = c("beet_root","123/92R","123/92R","10.1233","9.485","ot_p",
                            "apple","12X0893","123fg","038q4",
                            "orange","9734","75R4",
                           "grapes", "123/R90","90X83.6"),
                "other_col1" = c("a","b","c","d","e","f","g","z",
                                "h","i","j","k","l","m","n","o"),
                "other_col2" = c("u","b","q","p","e","f","b","z",
                                "8","i","j","9","l","m","5","o"))

> input
        list other_col1 other_col2
1  beet_root          a          u
2    123/92R          b          b
3    123/92R          c          q
4    10.1233          d          p
5      9.485          e          e
6       ot_p          f          f
7      apple          g          b
8    12X0893          z          z
9      123fg          h          8
10     038q4          i          i
11    orange          j          j
12      9734          k          9
13      75R4          l          l
14    grapes          m          m
15   123/R90          n          5
16   90X83.6          o          o

expected_output = data.frame("list" = c("123/92R","123/92R","10.1233","9.485","ot_p",
                                      "12X0893","123fg","038q4",
                                      "9734","75R4",
                                      "123/R90","90X83.6"),
                           "other_col1" = c("b","c","d","e",
                                            "f","z","h","i",
                                            "k","l",
                                            "n","o"
                                            ),
                           "other_col2" = c("b","q","p","e",
                                            "f","z","8","i",
                                            "9","l",
                                            "5","o"),
                           "names" = c("beet_root","beet_root","beet_root","beet_root","beet_root",
                                       "apple","apple","apple",
                                       "orange","orange",
                                       "grapes","grapes"))

> expected_output
      list other_col1 other_col2     names
1  123/92R          b          b beet_root
2  123/92R          c          q beet_root
3  10.1233          d          p beet_root
4    9.485          e          e beet_root
5     ot_p          f          f beet_root
6  12X0893          z          z     apple
7    123fg          h          8     apple
8    038q4          i          i     apple
9     9734          k          9    orange
10    75R4          l          l    orange
11 123/R90          n          5    grapes
12 90X83.6          o          o    grapes

mishabalyasin · September 20, 2019, 8:53am

Your reprex is fine as it is, but can you actually explain in words what are you trying to do? I don't think it's very productive if everyone who wants to help need to decipher the relation between input and output. It'll be faster for you to explain it since you know it already .

AbhishekHP · September 20, 2019, 8:56am

ok, will update question

andresrcs · September 20, 2019, 1:32pm

This produces the expected output, although you may want to refine the regex filters for your actual application.

input = data.frame(stringsAsFactors = FALSE,
                   "list" = c("beet_root","123/92R","123/92R","10.1233","9.485","ot_p",
                              "apple","12X0893","123fg","038q4",
                              "orange","9734","75R4",
                              "grapes", "123/R90","90X83.6"),
                   "other_col1" = c("a","b","c","d","e","f","g","z",
                                    "h","i","j","k","l","m","n","o"),
                   "other_col2" = c("u","b","q","p","e","f","b","z",
                                    "8","i","j","9","l","m","5","o"))
library(tidyverse)

input %>% 
    mutate(name = if_else(str_detect(list, "\\d|ot_p"), NA_character_, list)) %>% 
    fill(name, .direction = "down") %>% 
    filter(str_detect(list, "\\d|ot_p"))
#>       list other_col1 other_col2      name
#> 1  123/92R          b          b beet_root
#> 2  123/92R          c          q beet_root
#> 3  10.1233          d          p beet_root
#> 4    9.485          e          e beet_root
#> 5     ot_p          f          f beet_root
#> 6  12X0893          z          z     apple
#> 7    123fg          h          8     apple
#> 8    038q4          i          i     apple
#> 9     9734          k          9    orange
#> 10    75R4          l          l    orange
#> 11 123/R90          n          5    grapes
#> 12 90X83.6          o          o    grapes

^{Created on 2019-09-20 by the reprex package (v0.3.0.9000)}

system · September 27, 2019, 1:32pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.