tidyr::separate() : One character var and many numerical vars

# A tibble: 3 x 1
  `Instr Code    Max Price    Min Price    Trades    Quantity    Value(In Mn)`
  <chr>                                                                       
1 EGEN              51.10        51.10         2       85000           4.343  
2 EMERALDOIL        22.50        22.50         2       52090           1.172  
3 FIRSTSBANK        11.00        11.00         1       50000           0.550 

How do I apply tidyr::separate() in the above case? I don't know what to use in thesep=argumnet. I want to keep all the variables.

# Data
df <- structure(list(`Instr Code    Max Price    Min Price    Trades    Quantity    Value(In Mn)` = c("EGEN              51.10        51.10         2       85000           4.343", 
                                                                                                "EMERALDOIL        22.50        22.50         2       52090           1.172", 
                                                                                                "FIRSTSBANK        11.00        11.00         1       50000           0.550"
)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, 
                                                                           -3L), spec = structure(list(cols = list(`Instr Code    Max Price    Min Price    Trades    Quantity    Value(In Mn)` = structure(list(), class = c("collector_character", 
   

Hi, can you provide an example of what you would expect the data to look like after you run separate() successfully?

Here is the first row of my desired data frame

> df_desired
# A tibble: 1 x 6
  `Instr Code` `Max Price` `Min Price` Trades Quantity `Value(In Mn)`
  <chr>              <dbl>       <dbl>  <dbl>    <dbl>          <dbl>
1 EGEN                51.1        51.1      2    85000           4.34

Ok, thank you, so is the main purpose to reduce the precision from 3 digits to 2 digits, so from: 4.343 to 4.34? That's the only change I noticed.

Unfortunately, no. Note that the problematic data frame has only one column. I want six columns.

Hi, for the specific example the issue is that the entire set of variables is read in as a single character variable. It looks like fixed width data, rather than comma or blank separated. If so, then you can specify the sep = argument as a numeric vector of positions to split at, so

separate(data = df, col = 1, 
         into = c("Instr Code",    "Max Price",    "Min Price",    "Trades",    "Quantity",   "Value(In Mn)"),
         sep = c(13,28,41,51,63), 
         convert = TRUE)

One drawback to this is that I think the "Instr Code" field will have trailing spaces.

Alternatively, and probably superior, you can use a simple regex (regular expression) to identify the
clumps of spaces in between the data fields.

separate(data = df, col = 1, 
         into = c("Instr Code",    "Max Price",    "Min Price",    "Trades",    "Quantity",   "Value(In Mn)"),
         sep = "\\s+", 
         convert = TRUE)

\s means any whitespace character, and + means one or more of the preceding character. The extra slash escapes the slash so R leaves it in the string before passing it to the regular expression. This will work unless the field "Instr Code" sometimes has spaces in it. Then you might be forced to use the numeric vector of positions.

If it was me, I would simplify the subsequent code by replacing the spaces in the variable names with _ or .. Also, you might be able to avoid this step by using readr::read_fwf() which would split things up by position as you were reading in the file.

1 Like

@atiretoo Many thanks! Just what I wanted; the problem is solved!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.