Split a column into multiple

Hi, I am quite inexperienced in R and I'm having a problem resolving an issue that I have. I'm not quite where to start resolving it in R, but I have not managed to do so in openrefine.

I have a spreadsheet with multiple columns. Column G contains a

number of codes that all have different meanings. For example, 02 means X, 0207 means y, but is a subset of X, 03 means something entirely different. What I want to do is split column G into multiple columns, where every "02" is contained in its own column called "02" for example, and the same for all the other codes. What's been happening when I tried this in openrefine (which i am more familiar with as I am so new to R) is I end up with 100's of columns and nothing is ordered. Ie not all the "02" in one column and "0105" in their own column and so on and so forth. How would I approach this? I will add a photo of column G as an example. Please advise?

The first step to solving this in R is to have the data in R.
Can you export the data to a csv format ?
R has built in read.csv() support.
also readr package has a more sophisticated read_csv() function that can be very helpful.

1 Like

It looks like you'll be parsing strings (e.g. detecting the "02" pattern), so if you haven't already started down the (great, life-changing, amazingly-useful) path of learning how to use regular expressions, that's going to be your best first step.

Once you have some practice regex-ing, you'll find that nearly every (practical) programming language under the sun (including R/Python/Julia and even some spreadsheets) has a regex implementation or library that follows roughly the same syntax. Regex standardization is truly one of the most (if not the most) stable de-facto standards in all of programming :slight_smile: !

(And once you have some regex patterns in mind for your data, you can simply read in the file lines then use those patterns to split up your data appropriately.)

1 Like

In the future, try to provide a reproducible example FAQ: How to do a minimal reproducible example ( reprex ) for beginners. I've tried to make one based on the screenshot but you should do this when posting a question. Here's an example which I think does what you want assuming you've imported the data into R already:

library(tidyverse)

Products <- tibble(
   G=c("HS codes: 0105 0106 0207 0407",
              "HS codes: 0105",
              "HS codes: 0105 0207 0407",
              "ICS codes: 67.120.20"
              )
)

Products_split <- Products %>%
   mutate(
      Code_01=str_extract_all(G, "01[0-9]{2}"),
      Code_02=str_extract_all(G, "02[0-9]{2}")
      )

Products_split
#> # A tibble: 4 x 3
#>   G                             Code_01   Code_02  
#>   <chr>                         <list>    <list>   
#> 1 HS codes: 0105 0106 0207 0407 <chr [2]> <chr [1]>
#> 2 HS codes: 0105                <chr [1]> <chr [0]>
#> 3 HS codes: 0105 0207 0407      <chr [1]> <chr [1]>
#> 4 ICS codes: 67.120.20          <chr [0]> <chr [0]>

Products_split %>% 
   pull(Code_01)
#> [[1]]
#> [1] "0105" "0106"
#> 
#> [[2]]
#> [1] "0105"
#> 
#> [[3]]
#> [1] "0105"
#> 
#> [[4]]
#> character(0)
Products_split %>% 
   pull(Code_02)
#> [[1]]
#> [1] "0207"
#> 
#> [[2]]
#> character(0)
#> 
#> [[3]]
#> [1] "0207"
#> 
#> [[4]]
#> character(0)

Created on 2021-02-24 by the reprex package (v1.0.0)

Thank you! I'll follow the FAQ and try again with your guidance that you pasted in the response

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.