advice on reformatting several variables into one column

ineedhelpwithr · April 23, 2023, 8:50pm

so I have a variable that is split into multiple categories (eight columns/categories in total). I am trying to combine these eight variables into one and then compare the lowest values to the highest values in regression. I apologize for the bad formatting. In other words, if the person had exam type 1, in the exam type column if someone had exam 1, it would be listed as 1, someone had exam 2, it would be listed as 2, etc...

Here is an example of how the data looks like (only listing three columns for the sake of simplicity and they are coded as 0,1:

examtype_1| examtype_2|examtype_3
---------------------------------
     0             0            1
     1             0            0

What I expect my final column to look like:

exam type
--------------
1
2
3
4
5
6
7
8
2
4
8
4
2

Is there a way to do this without dummy coding it?

I am thinking of something like but am not sure how to proceed with the 0,1 coding above:

clean_df <- df %>% mutate(exam_type = c("examtype_1","examtype_2","examtype_3","examtype_4","examtype_5","examtype_6","examtype_7","examtype_8")) 
#I know that this will not work

any advice is greatly appreciated.

FJCC · April 23, 2023, 10:56pm

I don't completely understand your goal. Here is transformation of a data frame similar to the one you posted. Do you want to filter out all the rows in the final data frame where the value column is zero?

DF <- data.frame(examtype_1 = c(0,1,0),
                 examtype_2 = c(0,0,1),
                 examtype_3 = c(1,0,0))
DF
#>   examtype_1 examtype_2 examtype_3
#> 1          0          0          1
#> 2          1          0          0
#> 3          0          1          0
library(tidyr)
library(dplyr)

DF |> mutate(ROW = row_number()) |> 
  pivot_longer(cols = -ROW, c("Exam","ExamNumber"),names_pattern = "(.+)_(.+)")
#> # A tibble: 9 × 4
#>     ROW Exam     ExamNumber value
#>   <int> <chr>    <chr>      <dbl>
#> 1     1 examtype 1              0
#> 2     1 examtype 2              0
#> 3     1 examtype 3              1
#> 4     2 examtype 1              1
#> 5     2 examtype 2              0
#> 6     2 examtype 3              0
#> 7     3 examtype 1              0
#> 8     3 examtype 2              1
#> 9     3 examtype 3              0

^{Created on 2023-04-23 with reprex v2.0.2}

EconProf · April 23, 2023, 11:32pm

This is a simple, but not elegant, method to convert from dummies to the exam each person had:

library(tidyverse)

DF <- data.frame(examtype_1 = c(0,1,0,0,1),
                 examtype_2 = c(0,0,1,0,0),
                 examtype_3 = c(1,0,0,1,0))

DF |> mutate(ExamType = examtype_1*1 + examtype_2*2 + examtype_3*3)
#>   examtype_1 examtype_2 examtype_3 ExamType
#> 1          0          0          1        3
#> 2          1          0          0        1
#> 3          0          1          0        2
#> 4          0          0          1        3
#> 5          1          0          0        1

^{Created on 2023-04-23 with reprex v2.0.2}

I also do not understand how you will compare the highest and lowest values (in regression??). Do you want a count for each exam type? Finally, what do you mean by "without dummy coding"?

system · May 14, 2023, 11:33pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.