How to convert binary values to factors ?

Hello, I'm a student at the university, I'm trying to learn to code all by myself. I need some help for an exercice.
I want to transform all of this data frame into one unique factorial variable

for example so that you understand, the variable University 1 is equal to 1, that means the first observation studied in university 1 and i want to create a new variable with 4 levels University 1 to 4 and in order to summarize all of this data frame.

1 Like

Hi,

Before I try and help you out, can you clarify if this exercise is part of coursework or homework? If so we have some policies on this forum that we can give you pointers but no solutions:

In case this is not part of homework, just tell us and we can provide a solution rather than tips. We rely on your honesty to tell us what it's used for :slight_smile:

Kind regards,
PJ

Hello it's not a homework, I'm trying to do exercises on a database to learn R, because I had R lessons 2 years before I forgot a lot

Hi,

Here is my implementation:

#Generate data
myData = data.frame(matrix(rep(c(1,0,0,0,0,0), 5)[1:25], ncol = 5))
colnames(myData) = paste0("University", 1:5)

#Convert the one-hot vector back into factor
allUniversities = colnames(myData)

myData$University = as.factor(apply(myData, 1, function(x) {
  allUniversities[as.logical(x)]
}))

> myData
  University1 University2 University3 University4 University5  University
1           1           0           0           0           0 University1
2           0           1           0           0           0 University2
3           0           0           1           0           0 University3
4           0           0           0           1           0 University4
5           0           0           0           0           1 University5
1 Like

Thank you so much ! .

This is a tidyverse based alternative that works by reshaping your data into a long format

library(tidyverse)

sample_data <- data.frame(
    University1 = c(1, 0, 0, 0, 0),
    University2 = c(0, 1, 0, 0, 0),
    University3 = c(0, 0, 1, 0, 0),
    University4 = c(0, 0, 0, 1, 0),
    University5 = c(0, 0, 0, 0, 1)
)

sample_data %>%
    pivot_longer(cols = starts_with("University"),
                 names_to = "University") %>% 
    filter(value == 1) %>% 
    select(University)
#> # A tibble: 5 x 1
#>   University 
#>   <chr>      
#> 1 University1
#> 2 University2
#> 3 University3
#> 4 University4
#> 5 University5