Classification to new column

I would need help to add a new column to my data, which would include the category text based on the identification number.

I have a column in my current data that includes four digits SIC-numbers for each row. I have criterias e.g. that codes between 0100-0999 and 2770-2749 should be = Consumer goods, and similarly e.g. 2500-2519, 2590-2599 should be = Manufacturing.

Thank you for help!

For this sort of task, you can use the dplyr::case_when() function

If you need more specific help, please provide a proper REPRoducible EXample (reprex) illustrating your issue.


Thanks for the help. However, I'd still need more specific help:
I created a vector for the Acquirer SIC column. Then I named the sequences that I want to be in industry1 as i1 and same for industry 2 (i2).
And then I applied the following formula, but it gives me following error.

AcSIC_Vector <- Data$'Acquirer SIC'
#> Error in eval(expr, envir, enclos): object 'Data' not found
i1 <- 0100:0999
i1 <- 2000:2399
i2 <- 6000-6999
AcSIC_Vector <- dataset$`Acquirer SIC`
#> Error in eval(expr, envir, enclos): object 'dataset' not found

df %>%
  mutate(group = case_when(i1 %in% AcSIC_Vector ~ "industry1",TRUE ~ "GO"))
#> Error in df %>% mutate(group = case_when(i1 %in% AcSIC_Vector ~ "industry1", : could not find function "%>%"

Created on 2020-10-26 by the reprex package (v0.3.0)

image

Here's just a picture of the original data that I have names as "Data", to provide some clarification to my code.

We can't copy from a screenshot, please post the sample data as formatted code, as explained on the reprex guide I linked for you.

Data <- tibble::tribble(
                 ~Acquiror.Full.Name, ~Acquirer.SIC, ~Acquiror.Datastream,
"Premier Asset Management Group PLC",         6282L,             "26287A",
                       "Imperial dd",         7011L,             "682812",
     "Primary Health Properties PLC",         6512L,             "870775",
                            "Dsv As",         4731L,             "505651",
              "Eurobank Ergasias SA",         6029L,             "308696",
                  "Opus Global Nyrt",         1611L,             "682856",
      "Societe de la Tour Eiffel SA",         6531L,             "936843",
              "Hojgaard Holding A/S",         1541L,             "539810",
                         "Poolia AB",         7361L,             "698982",
                          "CYBG PLC",         6029L,             "8898PW",
    "Inypsa Informes Y Proyectos SA",         3679L,             "685499",
                          "RELX PLC",         2721L,             "901080",
                  "Vistula Group SA",         2311L,             "142070",
             "Endeavour Mining Corp",         1041L,             "548378",
                           "YIT Oyj",         1531L,             "142836",
                      "IP Group PLC",         6282L,             "27886R",
                        "Midsona AB",         2099L,             "698716",
                    "Avingtrans PLC",         3593L,             "904710",
               "John Wood Group PLC",         8711L,             "258098",
                 "Standard Life PLC",         6282L,             "36228U"
)
head(Data)
#> # A tibble: 6 x 3
#>   Acquiror.Full.Name                 Acquirer.SIC Acquiror.Datastream
#>   <chr>                                     <int> <chr>              
#> 1 Premier Asset Management Group PLC         6282 26287A             
#> 2 Imperial dd                                7011 682812             
#> 3 Primary Health Properties PLC              6512 870775             
#> 4 Dsv As                                     4731 505651             
#> 5 Eurobank Ergasias SA                       6029 308696             
#> 6 Opus Global Nyrt                           1611 682856

Created on 2020-10-26 by the reprex package (v0.3.0)

Sorry for the mistake, hope this works.

Here a sample solution you can modify

library(dplyr)

Data <- tibble::tribble(
    ~Acquiror.Full.Name, ~Acquirer.SIC, ~Acquiror.Datastream,
    "Premier Asset Management Group PLC",         6282L,             "26287A",
    "Imperial dd",         7011L,             "682812",
    "Primary Health Properties PLC",         6512L,             "870775",
    "Dsv As",         4731L,             "505651",
    "Eurobank Ergasias SA",         6029L,             "308696",
    "Opus Global Nyrt",         1611L,             "682856",
    "Societe de la Tour Eiffel SA",         6531L,             "936843",
    "Hojgaard Holding A/S",         1541L,             "539810",
    "Poolia AB",         7361L,             "698982",
    "CYBG PLC",         6029L,             "8898PW",
    "Inypsa Informes Y Proyectos SA",         3679L,             "685499",
    "RELX PLC",         2721L,             "901080",
    "Vistula Group SA",         2311L,             "142070",
    "Endeavour Mining Corp",         1041L,             "548378",
    "YIT Oyj",         1531L,             "142836",
    "IP Group PLC",         6282L,             "27886R",
    "Midsona AB",         2099L,             "698716",
    "Avingtrans PLC",         3593L,             "904710",
    "John Wood Group PLC",         8711L,             "258098",
    "Standard Life PLC",         6282L,             "36228U"
)

Data %>% 
    mutate(group = case_when(
        Acquirer.SIC >= 100L & Acquirer.SIC <= 999L ~ "industry1",
        Acquirer.SIC >= 2000L & Acquirer.SIC <= 2399L ~ "industry2",
        TRUE ~ "GO"
    ))
#> # A tibble: 20 x 4
#>    Acquiror.Full.Name                 Acquirer.SIC Acquiror.Datastream group    
#>    <chr>                                     <int> <chr>               <chr>    
#>  1 Premier Asset Management Group PLC         6282 26287A              GO       
#>  2 Imperial dd                                7011 682812              GO       
#>  3 Primary Health Properties PLC              6512 870775              GO       
#>  4 Dsv As                                     4731 505651              GO       
#>  5 Eurobank Ergasias SA                       6029 308696              GO       
#>  6 Opus Global Nyrt                           1611 682856              GO       
#>  7 Societe de la Tour Eiffel SA               6531 936843              GO       
#>  8 Hojgaard Holding A/S                       1541 539810              GO       
#>  9 Poolia AB                                  7361 698982              GO       
#> 10 CYBG PLC                                   6029 8898PW              GO       
#> 11 Inypsa Informes Y Proyectos SA             3679 685499              GO       
#> 12 RELX PLC                                   2721 901080              GO       
#> 13 Vistula Group SA                           2311 142070              industry2
#> 14 Endeavour Mining Corp                      1041 548378              GO       
#> 15 YIT Oyj                                    1531 142836              GO       
#> 16 IP Group PLC                               6282 27886R              GO       
#> 17 Midsona AB                                 2099 698716              industry2
#> 18 Avingtrans PLC                             3593 904710              GO       
#> 19 John Wood Group PLC                        8711 258098              GO       
#> 20 Standard Life PLC                          6282 36228U              GO

Created on 2020-10-25 by the reprex package (v0.3.0.9001)

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.