Firstly, thank you for your assistance.
Every time I read, I think "damn cool nickname!".
Anyways, I don't think I quite explained well what I was trying to do, because I tried what you suggested and I did not get the expected result. But after working with it a little longer I was able to understand it.
# Load csv:
df_eta <- read_csv('Shipment_Profile_Reports_ETA_01012017_02182020.csv', col_names = TRUE)
df_etd <- read_csv('Shipment_Profile_Reports_ETD_01012017_02182020.csv', col_names = TRUE)
# Decided to full_join the data
df_all_og <- full_join(df_eta,df_etd)
glimpse(df_all_og)
I look through the colnames of df_all_og to determine what would be most pertinent for what I'm trying to achieve.
The actual colnames(df_all_og) is 149 observations long. Too many, lets clean the "trash".
Below the "..." represents the range of columns I want. hence, I want columns 1,2,4,5,6:13,17:19,31:101,120:127.
> colnames(df_all_og)
[1] "Shipment ID"
[2] "Trans"
[4] "Mode"
[5] "Origin"
[6] "Origin Ctry"
...
[13] "House Ref"
[17] "Goods Description"
...
[19] "Destination ETA"
[31] "Added"
...
[101] "Direction"
[120] "Total Accrual (Recognized+Unrecognized)"
...
[127] "Total WIP (Recognized+Unrecognized)"
I could just do this:
# Create a vector with columns desired:
df_col_num <- c(1,2,4,5,6:13,17:19,31:101,120:127)
# Synthesize dataset to variables of interest to begin EDA.
df_all <- df_all_og %>%
select(df_col_num)
What I was trying to do was this:
df_all <- df_all_og %>%
select(
`Shipment ID`,
Trans,
Mode:`House Ref`,
`Goods Description`:`Destination ETA`,
Added:Direction,
starts_with("Total")
)
From here I can begin the EDA and use dplyr rename functions to change future subsets of this still "large" variable numbers.
Hope this helps any other newbies.
Cheers.