 # Changing wide format to long format data interactively

Hey everybody,
I am working with the R Shiny currently and having a problem of converting the data from the wide format into long format.I have tried to use the "reshape2" package with the "melt" function just in the R Console and it was working fine to convert one variable type ("Weight_V1","Weight_V2","Weight_V3") into like what shown in the picture below with the new variable "Weight" and "Visit" (this one was converted and exported before using SAS program):

But my question is, Is it actually possible to convert more than one variable type for example not only for Weight (Weight_V1,Weight_V2,Weight_V3) but for BodyFat (BodyFat_V1,BodyFat_V2,BodyFat_V3) as well but still assigned to the same column for "Visit" and create other new column for the variable "BodyFat"???

This is how it looks like using the SAS code:

Results in SAS Program:

I was trying to find the solution in R Shiny and how it could be interactive for the user which mean that they could just write or choose from the Input which variables and convert it into the long format type of data like i did in SAS, but i couldn't find any solution for this problem. Do you have any idea for this?

I am quite new with R Shiny and I hope someone can help me with this one. Thanks a lot!!

You can do this sort of thing with the pivot_longer function of tidyr. A simple example is shown below.

If you are just learning to use R, I suggest you get the code working in a normal script and then move on to trying shiny. Using shiny adds another level and type of difficulty to learning the code.

``````library(tidyr)
DF <- data.frame(Sno = 1:3, Weight_V1 = c(55, 62, 51), Weight_V2 = c(65, 67, 61),
BodyFat_V1 = c(40, 41, 42), BodyFat_V2 = c(38, 43, 40))
DF
#>   Sno Weight_V1 Weight_V2 BodyFat_V1 BodyFat_V2
#> 1   1        55        65         40         38
#> 2   2        62        67         41         43
#> 3   3        51        61         42         40
pivot_longer(data = DF, cols = Weight_V1:BodyFat_V2, names_to = c("MeasType", "Visit"),
names_pattern = "(.+)_(.+)", values_to = "Values")
#> # A tibble: 12 x 4
#>      Sno MeasType Visit Values
#>    <int> <chr>    <chr>  <dbl>
#>  1     1 Weight   V1        55
#>  2     1 Weight   V2        65
#>  3     1 BodyFat  V1        40
#>  4     1 BodyFat  V2        38
#>  5     2 Weight   V1        62
#>  6     2 Weight   V2        67
#>  7     2 BodyFat  V1        41
#>  8     2 BodyFat  V2        43
#>  9     3 Weight   V1        51
#> 10     3 Weight   V2        61
#> 11     3 BodyFat  V1        42
#> 12     3 BodyFat  V2        40
``````

Created on 2020-10-09 by the reprex package (v0.3.0)

By the way, here are two other ways to get similar results with pivot_longer. The function has many options.

``````pivot_longer(data = DF, cols = Weight_V1:BodyFat_V2, names_to = c("MeasType", "Visit"),
names_sep = "_", values_to = "Values")

pivot_longer(data = DF, cols = Weight_V1:BodyFat_V2,
names_to = c("MeasType", NA, "Visit"),
names_pattern = "(.+)_(V)(\\d)", values_to = "Values")
``````

Thank you so much for the info! that's really helpful! I will try it first in a normal script and move it to shiny.

Is it possible to make the Weight and BodyFat into a seperate column instead of putting it in one variable "MeasType" for each like the one below?

This code results in separate columns for the BodyFat and Weight values. The .value label in the names_to argument tells the function to use that part of the name to distinguish values.

``````DF <- data.frame(Sno = 1:3, Weight_V1 = c(55, 62, 51), Weight_V2 = c(65, 67, 61),
BodyFat_V1 = c(40, 41, 42), BodyFat_V2 = c(38, 43, 40))

pivot_longer(data = DF, cols = Weight_V1:BodyFat_V2, names_to = c(".value", "Visit"),
names_sep = "_")
``````

Thank you so much! I think I understand the function for "names_to" argument and "names_sep" argument. But what I am not actually quite understand the options for names_pattern and how you define it "(.+)_(V)(\d)". What does that actually mean? Sorry for asking a lot of questions! ``````pivot_longer(data = DF, cols = Weight_V1:BodyFat_V2,
names_to = c("MeasType", NA, "Visit"),
names_pattern = "(.+)_(V)(\\d)", values_to = "Values")
``````

`(.+)_(V)(\\d)` is a regular expression, a kind of advance wildcard language for handling text. It is widely used in programming and is well worth learning. There are many tutorials available and I will only explain this particular one.
Each pair of parentheses defines a group, so this expression has three groups and the underscore character is not in any of them. The . symbol represents any character and the + means "one or more", so .+ means "one or more of any character". The _ represents a literal underscore and the V represents that letter. The \\d represents any digit. The regular expression as a whole matches "one or more of any character followed by an underscore followed by V followed by a digit" and these are collected into three groups. These groups are matched with the column names in names_to. The first group goes to the MeasType column, the second group (the V) does not get a column because it is assigned to NA in names_to, and the third group goes to the column named Visit.

1 Like

Now I understand it better after your explanation! Thank you so much!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.