frequency table of a variable

cfmr · October 31, 2022, 4:37pm

Hei!

I hadde created a new column (CATEGORIA) in a dataframe(df), with the variables (NO ACONSEJABLE, ESTANDAR,TOP).

I'm trying to have a frequency table of the column CATEGORIA, and i used: table(df$CATEGORIA)

but it appears: table of extent 0 >

How can I solve the problem?

Thank You.

FJCC · October 31, 2022, 4:44pm

Please post the output of

dput(head(df, 20))

That will give us code to replicate the first 20 rows of your data set. Paste the output between lines with three back ticks, like this
```
Output of dput()
```

cfmr · October 31, 2022, 5:36pm

[quote="FJCC, post:2, topic:151570"]
dput(head(df, 20))
[/quo

> dput(head(df, 20))
structure(list(host_is_superhost = c(0L, 0L, 0L, 0L, 1L, 0L, 
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), host_identity_verified = c(1L, 
0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 
0L, 0L, 0L), bathrooms = c(3L, 3L, 3L, 3L, 3L, 3L, 5L, 3L, 3L, 
3L, 3L, 3L, 5L, 17L, 5L, 3L, 3L, 3L, 3L, 3L), bedrooms = c(1L, 
1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 
1L, 1L, 1L), daily_price = c(94L, 125L, 100L, 120L, 70L, 200L, 
700L, 250L, 100L, 280L, 320L, 240L, 290L, 290L, 220L, 84L, 60L, 
99L, 110L, 110L), security_deposit = c(1L, 31L, 1L, 48L, 13L, 
48L, 1L, 73L, 13L, 1L, 1L, 1L, 1L, 1L, 1L, 38L, 1L, 56L, 1L, 
1L), minimum_nights = c(2L, 2L, 2L, 2L, 30L, 15L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 30L, 3L, 2L, 2L), number_of_reviews = c(84L, 
3L, 70L, 57L, 44L, 79L, 72L, 126L, 377L, 22L, 31L, 1L, 3L, 48L, 
10L, 56L, 7L, 7L, 103L, 107L), review_scores_rating = c(94L, 
100L, 97L, 97L, 90L, 98L, 96L, 98L, 94L, 95L, 95L, 40L, 60L, 
94L, 86L, 91L, 100L, 94L, 95L, 94L)), row.names = c(NA, 20L), class = "data.frame")
>

thanks

FJCC · October 31, 2022, 5:54pm

The data you posted does not have a column named CATEGORIA and all of the variables are numeric, which is not useful for a frequency table. I invented a column named Categoria and made a table from it. Does this code work for you?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
df <- structure(list(host_is_superhost = c(0L, 0L, 0L, 0L, 1L, 0L, 
                                     1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), 
               host_identity_verified = c(1L, 
                                          0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 
                                          0L, 0L, 0L), 
               bathrooms = c(3L, 3L, 3L, 3L, 3L, 3L, 5L, 3L, 3L, 
                             3L, 3L, 3L, 5L, 17L, 5L, 3L, 3L, 3L, 3L, 3L), 
               bedrooms = c(1L, 
                            1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 
                            1L, 1L, 1L), 
               daily_price = c(94L, 125L, 100L, 120L, 70L, 200L, 
                               700L, 250L, 100L, 280L, 320L, 240L, 290L, 290L, 220L, 84L, 60L, 
                               99L, 110L, 110L), 
               security_deposit = c(1L, 31L, 1L, 48L, 13L, 
                                    48L, 1L, 73L, 13L, 1L, 1L, 1L, 1L, 1L, 1L, 38L, 1L, 56L, 1L, 
                                    1L), 
               minimum_nights = c(2L, 2L, 2L, 2L, 30L, 15L, 1L, 1L, 1L, 
                                  1L, 1L, 1L, 1L, 1L, 1L, 2L, 30L, 3L, 2L, 2L), 
               number_of_reviews = c(84L, 
                                     3L, 70L, 57L, 44L, 79L, 72L, 126L, 377L, 22L, 31L, 1L, 3L, 48L, 
                                     10L, 56L, 7L, 7L, 103L, 107L), 
               review_scores_rating = c(94L, 
                                        100L, 97L, 97L, 90L, 98L, 96L, 98L, 94L, 95L, 95L, 40L, 60L, 
                                        94L, 86L, 91L, 100L, 94L, 95L, 94L)), 
          row.names = c(NA, 20L), class = "data.frame")
df <- df |> mutate(Categoria = case_when(
  daily_price < 100 ~ "NO ACONSEJABLE", 
  daily_price >=100 & daily_price < 200 ~ "ESTANDAR",
  daily_price >= 200 ~ "TOP"
))
table(df$Categoria)
#> 
#>       ESTANDAR NO ACONSEJABLE            TOP 
#>              6              5              9

^{Created on 2022-10-31 with reprex v2.0.2}

cfmr · October 31, 2022, 7:53pm

Hei!

I could not solve it...

I'm writing the code, so maybe you can find what I did wrong:

the file is in World Happiness Report | Kaggle

library(tidyr)
library(dplyr)
library(ggplot2)

Change the file to a dataframe.

df<-read.csv("C:/Airbnb_Milan.csv")

##create a new dataframe just withe the columns: “host_is_superhost” , “host_identity_verified” ,“bathrooms” ,“bedrooms” ,“daily_price” ,“security_deposit” ,“minimum_nights” ,“number_of_reviews” ,“review_scores_rating”

df <- subset(df, select = c(host_is_superhost , host_identity_verified ,bathrooms ,bedrooms,daily_price ,security_deposit,minimum_nights ,number_of_reviews ,review_scores_rating))

change the variables in “host_is_superhost” from 0, 1 to: “SI” y, “NO”. (using recode).

df %>% mutate(host_is_superhost=recode(host_is_superhost, "0"="SI", "1"="NO"))

change the variables in “host_identity_verified” from 0, 1 to: “VERIFICA” y “NO VERIFICA”.

df %>% mutate(host_identity_verified=recode(host_is_superhost, "0"="VERIFICA", "1"="NO VERIFICA"))

##filter the dataset buy apartaments with minimal nights <=7.

ApNochesMenorIgual7<-df[(df$minimum_nights<=7),]
ApNochesMenorIgual7

##create a cathegoric vector “CATEGORÍA”: if review_scores_rating <= 49 ~ 'NO ACONSEJABLE',
review_scores_rating >= 50 & review_scores_rating <= 75 ~ 'ESTANDAR',
review_scores_rating >= 76 & review_scores_rating <=100 ~ 'TOP'

df %>% mutate(df, CATEGORIA = case_when(
review_scores_rating <= 49 ~ 'NO ACONSEJABLE',
review_scores_rating >= 50 & review_scores_rating <= 75 ~ 'ESTANDAR',
review_scores_rating >= 76 & review_scores_rating <=100 ~ 'TOP'
))

##Show a frequency table for CATEGORÍA.

table(df$CATEGORIA)

histogram with the price per day.

library(ggplot2)
library(plotly)

ggplot(df)+geom_histogram(mapping=aes(daily_price))

##graphic relation between bathrooms and bedrooms

ggplot(df)+geom_point(mapping=aes(x=bedrooms, y=bathrooms))+geom_smooth(aes(x=bedrooms, y=bathrooms),method=lm)

##histograma from number_of_reviews relatet if host_is_superhost or not .

ggplot(df)+geom_histogram(mapping=aes(x=host_is_superhost, fill=number_of_reviews))

##histogram for each value of “CATEGORÍA” where we can see security_deposit dependeing if host_is_superhost or not.

FJCC · October 31, 2022, 9:08pm

Your code is correct except that you need to store the results of functions. For example, if you run

df %>% mutate(host_identity_verified=recode(host_is_superhost, "0"="VERIFICA", "1"="NO VERIFICA"))

the result of the function is printed to the console, but it is not stored anywhere. Run this instead

df <- df %>% mutate(host_identity_verified=recode(host_is_superhost, "0"="VERIFICA", "1"="NO VERIFICA"))

and the result is stored in df.

library(dplyr)
df <- structure(list(host_is_superhost = c(0L, 0L, 0L, 0L, 1L, 0L, 
                                     1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), 
               host_identity_verified = c(1L, 
                                          0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 
                                          0L, 0L, 0L), 
               bathrooms = c(3L, 3L, 3L, 3L, 3L, 3L, 5L, 3L, 3L, 
                             3L, 3L, 3L, 5L, 17L, 5L, 3L, 3L, 3L, 3L, 3L), 
               bedrooms = c(1L, 
                            1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 
                            1L, 1L, 1L), 
               daily_price = c(94L, 125L, 100L, 120L, 70L, 200L, 
                               700L, 250L, 100L, 280L, 320L, 240L, 290L, 290L, 220L, 84L, 60L, 
                               99L, 110L, 110L), 
               security_deposit = c(1L, 31L, 1L, 48L, 13L, 
                                    48L, 1L, 73L, 13L, 1L, 1L, 1L, 1L, 1L, 1L, 38L, 1L, 56L, 1L, 
                                    1L), 
               minimum_nights = c(2L, 2L, 2L, 2L, 30L, 15L, 1L, 1L, 1L, 
                                  1L, 1L, 1L, 1L, 1L, 1L, 2L, 30L, 3L, 2L, 2L), 
               number_of_reviews = c(84L, 
                                     3L, 70L, 57L, 44L, 79L, 72L, 126L, 377L, 22L, 31L, 1L, 3L, 48L, 
                                     10L, 56L, 7L, 7L, 103L, 107L), 
               review_scores_rating = c(94L, 
                                        100L, 97L, 97L, 90L, 98L, 96L, 98L, 94L, 95L, 95L, 40L, 60L, 
                                        94L, 86L, 91L, 100L, 94L, 95L, 94L)), 
          row.names = c(NA, 20L), class = "data.frame")

#change the variables in “host_is_superhost” from 0, 1 to: “SI” y, “NO”. (using recode).

df <- df %>% mutate(host_is_superhost=recode(host_is_superhost, "0"="SI", "1"="NO"))
#change the variables in “host_identity_verified” from 0, 1 to: “VERIFICA” y “NO VERIFICA”.

df <- df %>% mutate(host_identity_verified=recode(host_is_superhost, "0"="VERIFICA", "1"="NO VERIFICA"))

##filter the dataset buy apartaments with minimal nights <=7.

ApNochesMenorIgual7<-df[(df$minimum_nights<=7),]

##create a cathegoric vector “CATEGORÍA”: if review_scores_rating <= 49 ~ 'NO ACONSEJABLE',
#review_scores_rating >= 50 & review_scores_rating <= 75 ~ 'ESTANDAR',
#review_scores_rating >= 76 & review_scores_rating <=100 ~ 'TOP'

df <- df %>% mutate(df, CATEGORIA = case_when(
  review_scores_rating <= 49 ~ 'NO ACONSEJABLE',
  review_scores_rating >= 50 & review_scores_rating <= 75 ~ 'ESTANDAR',
  review_scores_rating >= 76 & review_scores_rating <=100 ~ 'TOP'
))

##Show a frequency table for CATEGORÍA.

table(df$CATEGORIA)
#> 
#>       ESTANDAR NO ACONSEJABLE            TOP 
#>              1              1             18

^{Created on 2022-10-31 with reprex v2.0.2}

cfmr · November 1, 2022, 5:30pm

Hei!

Thank you, I have solved the problem.

About the same exercise, can you helpe me with the to last points, about obtain an histagram for:

##histograma from number_of_reviews relatet if host_is_superhost or not .

##histogram for each value of “CATEGORÍA” where we can see security_deposit dependeing if host_is_superhost or not.

I think the problem is that both variables are descrete, but I don't know how to do it.

Thank you.

system · November 8, 2022, 5:31pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.