Troubles separating character columns

bryanrt · February 28, 2021, 6:50pm

Goal

I have a few text thousand files I am reading through with purrr::pmap() and trying to turn them into dataframes. I am struggling to separate the columns based on white space. Any and all help is greatly appreciated. Here is a small example of what I am working with and my attempts thus far.

REPREX

library(tidyr)
library(dplyr)

data <- c(" 0 DAG  1 31 93.59",
  " 1 DAG  1 31 107.7",
  "31 DAG  0  1 39.00",
  " 0 SRH  1 31 90.38",
  " 1 SRH  1 31 100.0",
  "31 SRH  0  1 29.00",
  " 0 PECS 1 31 79.54",
  " 1 PECS 1 31 84.27",
  "31 PECS 0  1 28.00") 

data %>% 
  as_tibble() %>% 
  separate(col = c("type", "station", "hour", "minute", "sec"))

nirgrahamuk · February 28, 2021, 7:03pm

I like this way, involving tiydyverse/readr package

data %>% 
paste0(collapse="\n") %>% 
readr::read_delim(delim=" ", 
                  col_names = c("type", "station", "hour", "minute", "sec")) %>% 
                  mutate(across(c(type,hour,minute),as.integer))

bryanrt · February 28, 2021, 7:30pm

That worked beautifully, thank you kindly for the teaching.

bryanrt · February 28, 2021, 11:36pm

Thank you so much for the help earlier, I have a follow-up question. That method seems to work great except for instance where there is only a single character string. When that happens I am returning an error that the object isn't found in my directory. Have you dealt with this before?

REPREX

json_data <- jsonlite::read_json(path = "https://gitlab.com/Bryanrt-geophys/ml-seismic-application/-/raw/master/spyder/EQTransformer_test/project/json/station_list.json",simplifyVector = T)

stations <- names(json_data)

network_data <- map2(json_data, stations, function(x, y){
  x[-2] %>%
    unlist() %>%
    paste0(collapse = " ") %>% 
    readr::read_delim(delim=" ", 
                      col_names = c("network_code",
                                    "reciever_latitude",
                                    "reciever_longitude",
                                    "reciever_elevation_m")) %>% 
    mutate(
      across(c(reciever_latitude, reciever_longitude, reciever_elevation_m),as.integer),
      station = y)
})

nirgrahamuk · March 5, 2021, 10:05am

from the documentation of read_delim

Literal data is most useful for examples and tests. It must contain at least one new line to be recognised as data (instead of a path) or be a vector of greater than length 1.

but you can always add a newline

library(tidyr)
library(dplyr)

data <- " 0 DAG  1 31 93.59" %>% 
  paste0(collapse = "\n")%>% 
  paste0("\n") %>% 
  readr::read_delim(delim=" ", 
                    col_names = c("type", "station", "hour", "minute", "sec")) %>% 
  mutate(across(c(type,hour,minute),as.integer))

bryanrt · March 6, 2021, 9:55pm

Awesome, thank you kindly for the teaching.

system · March 13, 2021, 9:55pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.