Ordering Factors after summarise

Hi. I have a seemingly simple question. I have scoured the boards, and could not find specific answer; usually the questions are more complex.

My data are composed of 17 sites with 80 land use categories, and percent of landuse . I simply want to get the average land use and SD for the categories. I made categories a factor. My code ran fine. But the output put the categories in alphabetical order. I tried multiple solutions (arrange(categories), or using levels=categories.in.order.

I just want to get the output for the categories in the original order, as described in the data.

Thanks!

Code:#### Summary statistics for categories

landuse_summary <- landuse.test %>%
group_by(categories) %>%
summarise(
Observations = n(),
Means = mean(percent),
St.dev =sd(percent)) %>%
arrange(desc(categories))
#> Error in landuse.test %>% group_by(categories) %>% summarise(Observations = n(), : could not find function "%>%"

landuse_summary$categories <- landuse_summary$categories %>% factor(levels=categories.in.order, ordered=TRUE)
#> Error in landuse_summary$categories %>% factor(levels = categories.in.order, : could not find function "%>%"

And my data, should be the first 20 rows--there are 920 rows for full set

data.frame(
stringsAsFactors = FALSE,
Name = c("Street1","Street1",
"Street1","Street1","Street1","Street1","Street1",
"Street1","Street1","Street1","Street1","Street1",
"Street1","Street1","Street1","Street1","Street1",
"Street1","Street1","Street1"),
categories = c("OSTDS",
"Residential.Low.Density","Residential.Medium.Density",
"Residential.High.Density","Total.residential",
"Commercial.and.Services","Industrial","Institutional","Total.commercial",
"Recreational","Open.Land",
"Cropland.and.Pastureland","Tree.Crops","Nurseries.and.Vineyards",
"Specialty.Farms","Total.Agriculture","Other.Open.Lands..Rural.",
"Total.Rural","Herbaceous","Shrub.and.Brushland"),
percent = c(372,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,17.65653533)
)

I think this produces the arrangement you want.

library(dplyr, warn.conflicts = FALSE)
landuse.test <- data.frame(
  stringsAsFactors = FALSE,
  Name = c("Street1","Street1",
           "Street1","Street1","Street1","Street1","Street1",
           "Street1","Street1","Street1","Street1","Street1",
           "Street1","Street1","Street1","Street1","Street1",
           "Street1","Street1","Street1"),
  categories = c("OSTDS",
                 "Residential.Low.Density","Residential.Medium.Density",
                 "Residential.High.Density","Total.residential",
                 "Commercial.and.Services","Industrial","Institutional","Total.commercial",
                 "Recreational","Open.Land",
                 "Cropland.and.Pastureland","Tree.Crops","Nurseries.and.Vineyards",
                 "Specialty.Farms","Total.Agriculture","Other.Open.Lands..Rural.",
                 "Total.Rural","Herbaceous","Shrub.and.Brushland"),
  percent = c(372,0,0,0,0,0,0,0,0,
              0,0,0,0,0,0,0,0,0,0,17.65653533)
)
LVLS <- c("OSTDS",
          "Residential.Low.Density","Residential.Medium.Density",
          "Residential.High.Density","Total.residential",
          "Commercial.and.Services","Industrial","Institutional","Total.commercial",
          "Recreational","Open.Land",
          "Cropland.and.Pastureland","Tree.Crops","Nurseries.and.Vineyards",
          "Specialty.Farms","Total.Agriculture","Other.Open.Lands..Rural.",
          "Total.Rural","Herbaceous","Shrub.and.Brushland")

landuse_summary <- landuse.test %>%
  group_by(categories) %>%
  summarise(
    Observations = n(),
    Means = mean(percent),
    St.dev =sd(percent))
#> `summarise()` ungrouping output (override with `.groups` argument)
landuse_summary <- landuse_summary %>% 
  mutate(categories = factor(categories, levels = LVLS)) %>% 
  arrange(categories)
landuse_summary
#> # A tibble: 20 x 4
#>    categories                 Observations Means St.dev
#>    <fct>                             <int> <dbl>  <dbl>
#>  1 OSTDS                                 1 372       NA
#>  2 Residential.Low.Density               1   0       NA
#>  3 Residential.Medium.Density            1   0       NA
#>  4 Residential.High.Density              1   0       NA
#>  5 Total.residential                     1   0       NA
#>  6 Commercial.and.Services               1   0       NA
#>  7 Industrial                            1   0       NA
#>  8 Institutional                         1   0       NA
#>  9 Total.commercial                      1   0       NA
#> 10 Recreational                          1   0       NA
#> 11 Open.Land                             1   0       NA
#> 12 Cropland.and.Pastureland              1   0       NA
#> 13 Tree.Crops                            1   0       NA
#> 14 Nurseries.and.Vineyards               1   0       NA
#> 15 Specialty.Farms                       1   0       NA
#> 16 Total.Agriculture                     1   0       NA
#> 17 Other.Open.Lands..Rural.              1   0       NA
#> 18 Total.Rural                           1   0       NA
#> 19 Herbaceous                            1   0       NA
#> 20 Shrub.and.Brushland                   1  17.7     NA

Created on 2020-12-17 by the reprex package (v0.3.0)

@FJCC thanks for this. Worked perfectly!

I have 40 variables, is there a more efficient way to get the name of the variables, in the correct order, other than using datapasta?.

When I use levels(landuse.long$categories), it lists the variables in alphabetic order.

If there is no logic to the category order, in terms of string content or the value of a neighboring column, then I think you just have to manually sort them by typing or copy/paste.

Have you gone through Forcats?

Building on @FJCC, yes you can avoid typing everything again. Assuming your variables are in the order you want in the dataframe, you can extract them in that order with:

LVLS <- unique(landuse.test$categories) 

@FJCC that was what I thought

@awprc I used it once, and will check it out later tonight. I assume there are routines to work with factors?

@David_Siddons--thanks--this is what I need.

Thanks

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.