Difficulties analyzing a dataset with character variables


I have a dataset which I am having hard time dealing with. The dataset structure is as follows:

Have you watched tv from 5-5.30 in the morning (this question is asked for 24 hours of the day). Potential answers are Yes and No.
Which TV channel did you watch during this time? Potential answers are Discovery, Fox, BBC, Animal Plannet.
What kind of tv program did you watch during this time?
Music, Movies, News, Sports.
So these three questions are asked about every hour of the day, starting from 5am to 4:30am. At the end, there are some variables about gender, income, age, and education.

I need to find out how many times each TV channel and TV program are watched during the day, and also see relations between gender, income, education, or age with the time, or the channel or the program watched.

What I have been able to do is to find out the number of times a tv channel or program was watched by using the table function:

tvchannel <- data %>% select(starts_with(Which tv channel) and then: table(tvchannel).

However I do not know how to find relations between these datas.

Hi, you may try to create dummy variables for all those 3 questions with multiple answers and then analyze using correlation. here is an example code using iris dataset

temp <- dummy_cols(
  select_columns = 'Species',
  remove_selected_columns = TRUE,
  ignore_na = TRUE)

Thank you for your answer. The thing is that these three questions are repeated for every half an hour (ex. 5-5.3, 5.30-6). So it is repated 48 times. I wanted to know if there is a faster way to summarize and find correlations between these variables?

Hi @andrewjess!

This reminds me of questions from an old stats course I took. One method we used in situations like this (not saying it's the best one, but could work for you maybe) is an ANOVA.

I'm not entirely sure what summary statistics and correlations you are looking for here, but the F-statistic produced by an ANOVA may provide the answer you're looking for.

This article covers some basics of the ANOVA in R, in case you're not familiar with it.

Hope this helps in some way!

Yes, now I understand. it must be of a longitudinal dataset.
A dataset is longitudinal if it tracks the same type of information on the same subjects at multiple points in time.
kindly search for correlation analysis in longitudinal data. I will also check, and if I get it, I will post.
Good Day

1 Like

Hi, I did some research, and it looks like Multi variant longitudinal dataset. I have seen three videos on this subject on YouTube. you may search for Longitudinal Multilevel Modeling in R Studio (PART 1) Part 2 and Part 3 by Jon B Stats and Psych.
In addition, you may also refer Longitudinal data analysis -- Advanced Statistics using R

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.