Histogram doesn't work

Hello,

I currently want to create an histogram and some problems occurred.

That is the code I created:
Gruppe_LM$Bilder <- subset(Gruppe_LM, select = c(T001, T002, T003, T004, T005, T006, T007, T008, T009, T010, T011, T012, T013, T014, T015, T016, T017, T018, T019))
Gruppe_DM$Bilder <- subset(Gruppe_DM, select = c(T001, T002, T003, T004, T005, T006, T007, T008, T009, T010, T011, T012, T013, T014, T015, T016, T017, T018, T019))

describe(Gruppe_LM$Bilder)
table(Gruppe_LM$Bilder)
hist(Gruppe_LM$Bilder, main=" Stimmung Light Mode", ylab = "Häufigkeit", xlab="Skalenwert", breaks = 5, xlim = c(1,6))

This is the info I get from R after "hist"
Fehler in hist.default(Gruppe_LM$Bilder, main = " ", ylab = "Häufigkeit", :
'x' muss numerisch sein

'x' has to be numeric

I aswell can't create an table-function with "Gruppe_LM$Bilder" because there is a mistake in table(Gruppe_LM$Bilder) : Versuch eine Tabelle mit mehr als 2^31 Elementen zu erstellen

How can I change that on my data?

Thank you for every answer
Lea

Hi @justlea,

I think you're close with how you have it set up, assuming you have a numeric variable in your Gruppe_LM dataframe.

The error is stating that the variable that you have given the hist() function is not numeric. So R is reading the Gruppe_LM$Bilder variable as something other than a numeric.

I believe the problem here is within your subset function, as I believe you are missing the subset expression, which should be listed after the dataframe Gruppe_LM. This should define some criteria of values to keep. For example let's say your Gruppe_LM dataframe has a variable called value, and it has a value from 1 to 20. But you only want to subset your data where the value is greater than 10.

Gruppe_LM$Bilder <- subset(Gruppe_LM, value > 10, select = c(001, T002, T003, T004, T005, T006, T007, T008, T009, T010, T011, T012, T013, T014, T015, T016, T017, T018, T019))

That should get your subset function working, however the other piece of this I'm curious about is your Gruppe_LM$Bilder variable. Are you passing those 19 columns (T001 through T019) into one column called Bilder?

An example of your dataframe would be helpful to troubleshoot this further. Here's a great post by user milesmcbain with info from EconomiCurtis and jessemaegan.

If you could please include a small example of your dataframe, this would help illustrate what data you are trying to get to show up in your histogram.

Cheers!

Thank you so much for you answer!

T001-T019: describes the questions peoples were asked in the study. The participants had to rate pictures on a 6 point likert scale.
So therefore in T001 we have data collected from the 134 people.
This divides in 2 conditions: Light Mode (n=67) and Dark Mode (n=67).

I will attach a picture of the describe function with the data from Light Mode.

I somehow now was able to create a histogram with the mean values of the condition Light Mode. I will also attach this hist. Is this a possibility to show data in a histogram? I therefore added the command RowMeans in front of the subset function.

Gruppe_LM$Bilder <- rowMeans(subset(Gruppe_LM, select = c(T001, T002, T003, T004, T005, T006, T007, T008, T009, T010, T011, T012, T013, T014, T015, T016, T017, T018, T019 )))
Gruppe_LM$Bilder
describe(Gruppe_LM$Bilder)
hist(Gruppe_LM$Bilder, main=" ", ylab = "Häufigkeit", xlab="Skalenwert", breaks = 5, xlim = c(1,6))

I'm curious and don't have much experience with R-Studio nor with the evaluation of a study. This is the first time for me that I have conducted a study in the context of a seminar in the university.

Bildschirmfoto 2022-09-10 um 09.20.22
Bildschirmfoto 2022-09-10 um 09.26.08

Thank you so much for taking your time! :slight_smile:

Ahh this is interesting! Nice work adding the rowMeans function and using describe() to evaluate some of the output. You can certainly use those functions to facilitate a "gut check" of your data. :+1:

However by adding the rowMeans into the mix I'm now unsure of what it is you are trying to accomplish with your histogram.

Would you kindly provide and explanation of what question/hypothesis/prompt you are addressing with your histogram, as well as an example dataframe of your Gruppe_LM data?

A great way to show that might be something like

head(Gruppe_LM)

Cheers!

Actually I'm not sure either what I want to represent with the histogram I created. Perhaps the frequency of the mean values that occurred for the LM group? I don't know if this is something important to show in a histogram.. I feel like it's not.

But I'm too unsure about it to use it in my term paper.. Maybe I will put it in the attachment

That is the outcome of head(Gruppe_LM):


Soziodemographic Data:
SD01: 1 = LM ; 2 =DM
SD02_01: Age
SD03: What people like more (LM, DM)
SD04: On which device the study was conducted

T001-T019: collected data from 1(really negativ) -> 6 (really positiv)

The main hypothesis from the study is: User interfaces (Dark and Light Mode) have an impact on the mood of users.

:slight_smile:

This is so neat!

As a consistent user of dark modes I'm interested in your findings!

I think from an analysis standpoint the histogram will do well to display the distribution of the participant's answers. However a histogram is only designed to receive a single numerical variable. However you dataframe is set up in what is called a "wide" dataframe. What I would recommend here is to modify your dataframe into a "long" dataframe. This will allow you to plot all of the ratings selected by participants for all photos in a single histogram.

What we want to do here is convert the column names T001 all the way down to T019 into their own column called "Bilder" or something. As those are the photo names in your dataframe right?

Then the values of each of those will be transposed with those columns into a new column called something like "Bewertung".

Then all you would need to do is plot the following histogram:

hist(Gruppe_DM$Bewertung)

For information on how to convert between wide and long dataframes, check out this cookbook article here

In order to answer the main hypothesis then, you'll probably want to fit your data to a model and then maybe pass that to an ANOVA or something similar! However these statistical methods rely on the assumption that your data follows a normal distribution. So the histogram here will help you visually determine whether your data is normally distributed. Then I would recommend a more statistical check using something like a Shapiro-Wilk test. This will give you quantitative interpretations on the normality of your data.

Good luck!

I have already done the inferential statistical analysis. It was checked with a Shapiro-Wilk test and a Levene test. After that, I performed a two-tailed t-test. The result is that there is no statistically significant difference between the means of the conditions, t(132) = -0.55, P =.58. So therefore user interfaces in this case don't have an impact. But there are actually a few studies to the whole concept of dark and light mode who got interesting results.

LM (M = 3.88; SD = 0.42).
DM = (M = 3.91; SD = 0.38).

I find your ideas about the transformation (wide to long) very interesting, but I think I have no more time to include them in my term paper. Unfortunately, I have to hand it in tomorrow.

But I will definitely try your suggestions out because it sounds really interesting! Just not in the next few weeks, because more term papers are waiting for me :sweat_smile:

I can't say it enough, thank you that you took all that time to answer my questions y questions :slight_smile:

1 Like

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.