converting columns of data frame to factor

str_guru · October 1, 2020, 8:27am

Hi I am giving labels to my data frame manually like below, I have 800 columns to be labeled , after that I am creating a subset of data frame (sub setting of data have many), then applying that data frame to function for calculation.

labels can be different for all chunks , also its very time taking for creating labels one by one for all chunks.

data<-data.frame( col1=c(1,1,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,1,1,1,NA,1,1,NA,NA,NA,NA,1,NA,NA,NA,NA,1,NA,1),
    col2=c(1,1,1,1,1,NA,NA,NA,NA,1,1,1,1,1,NA,NA,NA,1,1,1,NA,1,1,1,1,1,NA,NA,NA,1,1,1,1,1,1,1,NA,NA,NA),
    col3=c(1,1,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,1,1,1,NA,NA,NA,1,NA,NA,1,1,1,1,1,NA,NA,1),
    col4=c(1,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),
    col5=c(1,2,1,1,1,2,1,2,2,1,2,NA,1,1,2,2,2,1,1,1,2,NA,2,1,1,1,2,2,2,NA,1,2,2,1,1,1,2,2,2)
  )  
  
  data$col1<-factor(data$col1, levels=1, labels="Sales")
  data$col2<-factor(data$col2, levels=1, labels="OPS")
  data$col3<-factor(data$col3, levels=1, labels="Management")
  data$col4<-factor(data$col4, levels=1, labels="HR")
  data$col5<-factor(data$col5, levels=c(1,2), labels=c("Local","Overseas"))
  
  df1<- data
  df1$cc1<-1
  df2<- subset(df, col5 == 'Local')
  df$cc2<-ifelse(df$col5 == 'Local',1,NA)
  lst<-list(df$cc1, df$cc2)
  ldat<-list("ALL" = df, "Local" =df2)

now I am looking for a function like where I can give a list of labels for eg .

colnames=c("col1","col2"...."col4")
col_labels =c("sales","OPS"...."HR")
# so here I will be just needed to update the list of columns and their labels

conv_frac <- function(dataset,var_bject){
for(i in 1:ldat)
lapply(factor,ldat(i))  # may be lapply or any thing else

}
# then  will apply factor_list
conv_frac(dataset = ldat, col =colnames  , labels = col_labels)

any solution for this

AlexisW · October 2, 2020, 7:25pm

Could you provide a reproducible example? It is unclear here how you intend to encode the labels for col5 in col_labels, how do you get the data initially? Also, the code for df2 doesn't work as is, do you need to do this conversion from data to ldat or would you directly work with data if you could?

Right now I think a simple solution might be something like:

col_labels =list("sales","OPS","Management","HR",c("Local","Overseas"))
names(col_labels) <- paste0("col", 1:5)

for(i in names(col_labels)){
  data[[i]] <- factor(data[[i]], labels = col_labels[[i]])
}

nirgrahamuk · October 4, 2020, 10:15am

library(tidyverse)
library(rlang)

data<-data.frame(
  gender = c(1,2,1,2,1,2,1,2,2,2,2,1,1,2,2,2,2,1,1,1,1,1,2,1,2,1,2,2,2,1,2,1,2,1,2,1,2,2,2),
  sector = c(3,3,1,2,5,4,4,4,4,3,3,4,3,4,2,1,4,2,3,4,4,4,3,1,2,1,5,5,4,3,1,4,5,2,3,4,5,1,4),
  col1=c(1,1,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,1,1,1,NA,1,1,NA,NA,NA,NA,1,NA,NA,NA,NA,1,NA,1),
  col2=c(1,1,1,1,1,NA,NA,NA,NA,1,1,1,1,1,NA,NA,NA,1,1,1,NA,1,1,1,1,1,NA,NA,NA,1,1,1,1,1,1,1,NA,NA,NA),
  col3=c(1,1,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,1,1,1,NA,NA,NA,1,NA,NA,1,1,1,1,1,NA,NA,1),
  col4=c(1,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),
  col5=c(1,2,1,1,1,2,1,2,2,1,2,NA,1,1,2,2,2,1,1,1,2,NA,2,1,1,1,2,2,2,NA,1,2,2,1,1,1,2,2,2)
)  

faclist <- list(
  gender = c("Male", "female"),
  sector = c("TX", "CA", "NY", "LA", "WA"),
  col1 = "Sales",
  col2 = "OPS",
  col3 = "Management",
  col4 = "HR",
  col5 = c("Local", "Overseas")
)

make_mutator <- function(x) {
  paste0(
    "factor(", names(faclist)[[x]],
    ",labels=c('",
    paste0(faclist[[x]],
      collapse = "','"
    ), "'))"
  )
}


(list_of_mutators <- purrr::map_chr(seq_len(length(faclist)),
                make_mutator))

names(list_of_mutators) <- names(faclist)

mutate(data,
       !!!parse_exprs(list_of_mutators))

str_guru · October 5, 2020, 9:29am

Hi this works perfectly fine on the data frame but i want to give ldat in place of data.
I want to run it for ldat.
mutate(ldat,
!!!parse_exprs(list_of_mutators))

nirgrahamuk · October 5, 2020, 9:48am

ldat is completely dertived from data (although there seem to be typos in the code for its construction that you shared)
Given that ldat is composed from data, process data then make ldat... why not ?

It is possibly to do it onto ldat, but its more complicated, I don't know why that effort would be justified.

str_guru · October 5, 2020, 10:10am

actually the requirement is only for ldat, because have created function all on the basis of ldat. that's why i am looking for a function which works for ldat.
also please explain for me to understand.

nirgrahamuk · October 5, 2020, 10:18am

ldat is not a dataframe, its a list of two dataframes.
mutate will not work on it directly.

# a representative ldat
(ldat <- list(slice(data,1:20),
             slice(data,20:39)))

purrr::map(ldat,
           ~mutate(.,
                  !!!parse_exprs(list_of_mutators)))

Note that such an approach will fail if a dataframe within ldat doesnt have every level corresponding to every possible label of each factor.
Again, why its a better idea to process to factors on data first, and then build ldat from that

str_guru · October 8, 2020, 12:25pm

how can give parameters separately in this function....???
var <- c("col1", "col2"....)
labels<-c("Sales","OPS",...)

and also please explain why we are using !!! three times here
mutate(data,
!!!parse_exprs(list_of_mutators))

nirgrahamuk · October 8, 2020, 12:34pm

var <- c("col1", "col2"....)
labels<-c("Sales","OPS",...)

This could only work if you never had more than one label per var. Which you don't...

Look at the documentation for parse_exprs, or search for rlang !!!

str_guru · October 8, 2020, 12:52pm

yes in my original data i have one label for one variable

jmcvw · October 8, 2020, 12:57pm

rlang can be confusing for a while. Fortunately most data analysis can be carried out without ever having to go there.

Here's a solution that looks like it might be easier to understand, and I think it does the job correctly. (Though I admit I haven't read the thread very closely.) This approach uses base R, as you do in you original question and avoids rlang (if you haven't already looked at the tidyverse I'd recommend it).

for( i in seq_along(ldat)){
  ldat2[[i]][] <- lapply(names(ldat[[i]]), function(x) factor(ldat[[i]][[x]], labels = faclist[[x]]))
}

nirgrahamuk · October 8, 2020, 1:04pm

Gender
Sector
Col5 ?

str_guru · October 8, 2020, 1:06pm

disregard those columns, i mean actual data would be like this

data<-data.frame(
col1=c(1,1,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,1,1,1,NA,1,1,NA,NA,NA,NA,1,NA,NA,NA,NA,1,NA,1),
col2=c(1,1,1,1,1,NA,NA,NA,NA,1,1,1,1,1,NA,NA,NA,1,1,1,NA,1,1,1,1,1,NA,NA,NA,1,1,1,1,1,1,1,NA,NA,NA),
col3=c(1,1,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,1,1,1,NA,NA,NA,1,NA,NA,1,1,1,1,1,NA,NA,1),
col4=c(1,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),
)
data$col1<-factor(data$col1, levels=1, labels="Sales")
data$col2<-factor(data$col2, levels=1, labels="OPS")
data$col3<-factor(data$col3, levels=1, labels="Management")
data$col4<-factor(data$col4, levels=1, labels="HR")

jmcvw · October 8, 2020, 1:18pm

Can I ask please if this is a homework task?

There was another question posted recently with an Identical dataset and had also had a (failed?) attempt very close that which I posted above.

str_guru · October 8, 2020, 1:30pm

No Idea about that, which question..??

jmcvw · October 8, 2020, 1:35pm

I don't wish to cast aspersions, I only want to know if this might be a homework assignment.

I have no issue helping people with their studies, but I would like to be aware - it may influence the way in which I provide help.

str_guru · October 8, 2020, 1:37pm

yes this is a kind of assignment

jmcvw · October 8, 2020, 1:40pm

Excellent. Please do continue to seek guidance here, but please identify your posts as homework. Most people will probably try to guide you understanding a bit more in their answers.

Also please consult this

nirgrahamuk · October 8, 2020, 2:03pm

#how to make this faclist 
faclist <- list(
  col1 = "Sales",
  col2 = "OPS",
  col3 = "Management",
  col4 = "HR"
)

#from
var <- c("col1","col2","col3","col4")
labels <- c("Sales","OPS","Management","HR")

#do 
faclist2 <- setNames(purrr::map(labels,identity),
                     var)

#check
all.equal(faclist,faclist2)

str_guru · October 8, 2020, 2:05pm

approach should be this only like input data will be ldat (list of dataframes)