Locating columns by numbers in csv before loading

I frequently read large .csv files with many columns > 300.
I only need ~ 50 of the columns. Because the names of the columns are from a bad formatted excel-file I need to locate them by numbers. It is awkward to locate them by counting them from the beginning in a text-file viewer.
Any ideas how I can improve this process ? A viewer which gives automatically the number of the columns?

Working with RStudio on Mac.

Peter

Would this work for you?

# Load libraries
library('tidyverse')

# Create dummy data
d = tibble(v_1 = rnorm(10))
for( i in 2:50 ){
  d = d %>% mutate(!!str_c('v_', i) := rnorm(10))
}

# See column names and numbers
cbind(colnames(d), 1:ncol(d)) %>% View

Hope it helps :slightly_smiling_face:

Ps. if this is recurrent task, I would look into creating a regex based "column finder" function

2 Likes

Hi Leon,
great idea. Modified it a little bit for my purpose.

df.help <- read_delim("../data/fallexport.csv",
                 delim = ";",
                escape_double = FALSE,
                locale = locale(
                date_names = "de",
                decimal_mark = ",",
                grouping_mark = ".",
                encoding = "ISO-8859-1"
                ),
                trim_ws = TRUE
                ) 
c_names <- colnames(df.help)
numbers <- seq(1:length(c_names))
df.colnumbers <- data_frame(c_names,numbers)

That gives me a data frame with column- names and numbers. Thats what I need when I am looking for the position of a specific column within the csv. file

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.