Converting multiple columns into a single column

Hi I am trying to convert data from wide to long format but without any success.
I have the following data set
http://web.mta.info/developers/data/nyct/turnstile/turnstile_120908.txt

This data has following columns: C/A,UNIT,SCP,DATE1,TIME1,DESC1,ENTRIES1,EXITS1,DATE2,TIME2,DESC2,ENTRIES2,EXITS2,DATE3,TIME3,DESC3,ENTRIES3,EXITS3,DATE4,TIME4,DESC4,ENTRIES4,EXITS4,DATE5,TIME5,DESC5,ENTRIES5,EXITS5,DATE6,TIME6,DESC6,ENTRIES6,EXITS6,DATE7,TIME7,DESC7,ENTRIES7,EXITS7,DATE8,TIME8,DESC8,ENTRIES8,EXITS8

I would like to create a data frame with just DATE, DESC, ENTRIES, EXITS along with other unique columns.
Here DATE column will contain values of DATE1, DATE2, DATE3,..., DATE8. similarly for other columns DESC, ENTRIES and EXITS.

So far, I am subsetting the main data frame 8 times and performing the required data manipulation.
Please see the code below

weekly_data <- read.csv("http://web.mta.info/developers/data/nyct/turnstile/turnstile_120908.txt", sep = ",")
weekly_data <- weekly_data[ , -which(names(weekly_data) %in% c("TIME1", "TIME2", "TIME3", "TIME4", "TIME5", "TIME6", "TIME7", "TIME7", "TIME8"))]
data_mta1 <- data_mta[which(data_mta$DESC1 == "REGULAR" | data_mta$DESC1 == "RECOVR AUD"), c("C/A","UNIT","SCP", "Station", "DATE1", "DESC1", "ENTRIES1", "EXITS1")]
data_mta2 <- data_mta[which(data_mta$DESC2 == "REGULAR" | data_mta$DESC2 == "RECOVR AUD"), c("C/A","UNIT","SCP", "Station", "DATE2", "DESC2", "ENTRIES2", "EXITS2")]
data_mta3 <- data_mta[which(data_mta$DESC3 == "REGULAR" | data_mta$DESC3 == "RECOVR AUD"), c("C/A","UNIT","SCP", "Station", "DATE3", "DESC3", "ENTRIES3", "EXITS3")]
data_mta4 <- data_mta[which(data_mta$DESC4 == "REGULAR" | data_mta$DESC4 == "RECOVR AUD"), c("C/A","UNIT","SCP", "Station", "DATE4", "DESC4", "ENTRIES4", "EXITS4")]
data_mta5 <- data_mta[which(data_mta$DESC5 == "REGULAR" | data_mta$DESC5 == "RECOVR AUD"), c("C/A","UNIT","SCP", "Station", "DATE5", "DESC5", "ENTRIES5", "EXITS5")]
data_mta6 <- data_mta[which(data_mta$DESC6 == "REGULAR" | data_mta$DESC6 == "RECOVR AUD"), c("C/A","UNIT","SCP", "Station", "DATE6", "DESC6", "ENTRIES6", "EXITS6")]
data_mta7 <- data_mta[which(data_mta$DESC7 == "REGULAR" | data_mta$DESC7 == "RECOVR AUD"), c("C/A","UNIT","SCP", "Station", "DATE7", "DESC7", "ENTRIES7", "EXITS7")]
data_mta8 <- data_mta[which(data_mta$DESC8 == "REGULAR" | data_mta$DESC8 == "RECOVR AUD"), c("C/A","UNIT","SCP", "Station", "DATE8", "DESC8", "ENTRIES8", "EXITS8")]

Any help would be greatly appreciated.
Thank you and have a good day.

You could use pivot_longer() from the tidyr package. The documentation here has some examples that are very similar to your situation.

Make an attempt using that function and let us know if you still have trouble.

2 Likes

thank you Siddharth.

Following is the code I tried which is giving an error.

data_mta %>% pivot_longer(
    cols = -c("C/A", "UNIT", "SCP"),
    names_to = c(".value", "number"),
    names_sep = "")

Error: Can't subset elements that don't exist.
x Locations 1, 2, 3, 4, 5, etc. don't exist.
i There are only 0 elements.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
In stringi::stri_split_regex(value, sep, n_max) :
  empty search patterns are not supported
Calls: %>% ... build_longer_spec -> str_separate -> str_split_fixed -> <Anonymous>

Any suggestions?

Figured this out, Thanks much.

data_mta %>% pivot_longer(
    cols = -c("C/A", "UNIT", "SCP"),
    names_to = c(".value", "number"),
    names_pattern = "(\\w+).*?(\\d)")
1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.