Best practices when converting between numeric and character for Likert data

gxm204 · May 16, 2020, 2:09pm

Hi all, Here is something I encounter sometime in my workflow that I haven't found a comfortable solution to. Let's say I am working on Likert-based data like the below:

# Create Likert dataset
df <- data.frame(q1 = c("Strongly Disagree", "Agree", "Disagree", "Strongly Agree"),
                 q2 = c("Strongly Agree", "Agree", "Strongly Disagree", "Neither Agree nor Disagree"),
                 q3 = c("Neither Agree nor Disagree", "Strongly Agree", "Agree", "Disagree"),
                 q4 = c("M", "F", "M", "F"))

I frequently find myself wanting to go between the character-based Likert data and the numeric-based Likert data depending on the need. Here is what I mean:

If I want to calculate any statistics on these Likert data, such as the correlation, I need to convert them into numeric.

> library(dplyr)
> 
> # h/t https://stackoverflow.com/questions/38724850/converting-likert-data-to-numeric-across-a-data-frame
> factorise <- function(x) {
+   case_when(x %in% c("Strongly Disagree") ~ 1,
+             x %in% c("Disagree") ~ 2,
+             x %in% c("Neither Agree nor Disagree") ~ 3,
+             x %in% c("Agree") ~ 4,
+             x %in% c("Strongly Agree") ~ 5)
+ }
> 
> 
> df2 <- mutate_at(df, c("q1", "q2", "q3"), factorise)
> 
> # To calculate statistics on the 1-5 data 
> # I need the numeric-based Likert data
> cor(select(df2, -q4))
           q1          q2          q3
q1  1.0000000 -0.10690450 -0.14142136
q2 -0.1069045  1.00000000 -0.07559289
q3 -0.1414214 -0.07559289  1.00000000

However, I often want to return to the character-based data for any tables or graphs so that they are clearly labeled (i.e. "Strongly Agree" instead of "5"

> # But for clearly labeled tables and graphs,
> # I need the character-based Likert data
> 
> table(df$q1, df$q4)
                   
                    F M
  Agree             1 0
  Disagree          0 1
  Strongly Agree    1 0
  Strongly Disagree 0 1

It's like I need some way to name or label those numeric values for the Likert variables. Any ideas on this? Thanks very much all.

phil_hummel · May 16, 2020, 2:54pm

I would consider creating a feature for Likert_score and Likert_label in the raw data and use whichever one was appropriate for the task at hand rather than converting for each analysis or report. Make both factors.

Question. Respondent. Likert_score Likeryt_lablel
Q1 1 5 "Strongly Agree"

Every factor has an integer value associate with each level and since Likert is a 1-5 scale you could use that underlying representation but I feel like it wouldn't make very maintainable of transparent code.

gxm204 · May 16, 2020, 7:31pm

That makes sense. Thanks Phil!

phil_hummel · May 16, 2020, 10:23pm

I shouldn't have suggested making both features factors, duh... You want to do math with the score so it would be best left as an integer. Here is a nice article on analysis with Likert scores

https://rpubs.com/dgolicher/Limert_scale_analysis

system · June 6, 2020, 10:23pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.