Styling advice on layout for tables and graphs, which package is the best?

christinelly · November 16, 2017, 10:04pm

Hello,

I was kindly given a link by @martin.R for layouts. I downloaded the stargazer package (the name sounded coolest).

but it turns out I can not figure out how to use it and I was wondering which of the packages (xtable, stargazer, pander, tables, ascii) you would recommend and that are easy to use?
I tried the knitr kable function but it seemed a bit limiting (not look very exciting layoutsl), or maybe I am not understanding it correctly.

As an example I am trying to make my data below more presentable in markdown.

Thanks in advance!
Christine

reprex::reprex_info()
#> Created by the reprex package v0.1.1.9000 on 2017-11-16

Jobsatisfaction %>%
count(year, type) %>%
group_by(year) %>%
mutate(prop = n / sum(n)) %>%
select(-n) %>%
spread(key = type, value = prop) %>% 
arrange(desc(year))
#> Error in Jobsatisfaction %>% count(year, type) %>% group_by(year) %>% : could not find function "%>%"

reprex::reprex_info()
#> Created by the reprex package v0.1.1.9000 on 2017-11-16

Jobsatisfaction %>%
  group_by(year, type) %>%
  tally() %>%
  group_by(year) %>%
  mutate(x = n / sum(n)) %>%
  
  ggplot() +
    geom_col(aes(
      x = factor(year),
      y = x,
      fill = factor(type)
      ), position = "stack")
#> Error in Jobsatisfaction %>% group_by(year, type) %>% tally() %>% group_by(year) %>% : could not find function "%>%"

DavoWW · November 16, 2017, 10:38pm

Christine,
Check out the kableExtra package on CRAN; it adds loads of formatting options for a knitr::kable table.

cderv · November 16, 2017, 11:07pm

Hi,

In your reprex you forgot to load packages so it did not worked as expected. Hence the pipe not found.

For your purpose of formatting tables you'll find this post useful

Among others it presents those

christinelly · November 17, 2017, 7:29am

Hi Christophe, thanks I was wondering about my errors in my reprex!!
I will check out the Kable extra package out. Thank you so much for your help and links!!

christinelly · November 17, 2017, 7:35am

Hello again,

Do I follow the HTML code if I want to use it for markdown?

christinelly · November 17, 2017, 8:25am

I am unable to make it happen. Please kindly advice, what am I doing wrong in my code?
I am consulting this page but can not see how I should apply the code.

http://haozhu233.github.io/kableExtra/awesome_table_in_html.html

reprex::reprex_info()
#> Created by the reprex package v0.1.1.9000 on 2017-11-17

library(knitr)
library(kableExtra)
options(knitr.table.format = "html") 

  
kable(Jobsatisfaction, "html") %>%
kablestyling(count(year, type) %>%
group_by(year) %>%
mutate(prop = n / sum(n)) %>%
select(-n) %>%
spread(key = type, value = prop) %>% 
arrange(desc(year)))
#> Error in inherits(x, "list"): object 'Jobsatisfaction' not found

cderv · November 17, 2017, 1:23pm

I think something is wrong with your reprex.
kable's call should be at the end as it is for printing and formatting. kablestyling can't contains dplyr verbs. You should call these functions at the end. The exemple you found and linked should help you understand how kable and kablestyling are working.

It should be something like this

Jobsatisfaction %>%
    count(year, type) %>%
    group_by(year) %>%
    mutate(prop = n / sum(n)) %>%
    select(-n) %>%
    spread(key = type, value = prop) %>% 
    arrange(desc(year)) %>%
    kable("html") %>%
    kable_styling()

(I do not know is this work as Jobsatisfaction is not known to me A reprex should reproductible so that I can copy and reproduce on my side. Hence the error. Jobsatisfaction does not exists.)

You can customize further with kablestyling(). Here is an example with mtcars.

reprex::reprex_info()
#> Created by the reprex package v0.1.1.9000 on 2017-11-17

library(knitr)
library(kableExtra)
library(dplyr)
#> 
#> Attachement du package : 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

mtcars %>%
  tibble::rownames_to_column() %>%
  select(1:6) %>%
  slice(1:5) %>%
  kable("html") %>%
  kable_styling(full_width = F, position = "left")

rowname	mpg	cyl	disp	hp	drat
Mazda RX4	21.0	6	160	110	3.90
Mazda RX4 Wag	21.0	6	160	110	3.90
Datsun 710	22.8	4	108	93	3.85
Hornet 4 Drive	21.4	6	258	110	3.08
Hornet Sportabout	18.7	8	360	175	3.15

FlorianGD · November 17, 2017, 3:43pm

On the huxtable page, there is a nice image comparing different features of numbers of table formatting packages :

https://hughjonesd.github.io/huxtable/design-principles.html

mara · November 17, 2017, 6:05pm

See also this thread

christinelly · November 17, 2017, 6:42pm

Hello Mara,

Thank you so much. I will check it out.
Could you please advise on my reprex, it seems I am still not getting it right. I thought I just needed to copy to clipboard and then enter reprex()
Eg: the code below (the code is wrong on purpose), what do I need to modify before copying to clipboard and entering reprex()
Do I need to call something like tidyverse (template) as in Jennys video?
like
library(tidyverse)
tidyverse (template) ?

library(knitr)
library(kableExtra)
library(ggplot2)
library(dplyr)
library(statsr)
library (tidyverse)
library (reprex)

Jobsatisfaction %>%
count(year, type) %>%
group_by(year) %>%
mutate(prop = n / sum(n)) %>%
select(-n) %>%
spread(key = type, value = prop) %>% 
kable_styling(arrange(desc(year)))

mara · November 17, 2017, 7:06pm

I think it's the space between library and the packages in the last two that are problematic!

Oh and you don't have your data in their either! It has to be totally self-contained!

nick · November 17, 2017, 7:11pm

I'm not sure the space causes any problems. However, you would need to put the "Jobsatisfaction" data into your script directly. The common way of doing that is with dput(Jobsatisfaction), which would generate code that could reproduce it. However, it's recommended that you have a limited data set to do that. For example, the following code would produce code for the first 10 rows:

dput(head(Jobsatisfaction, 10))

You would then paste the code like this:

Jobsatisfactionsample <- [Pasted code goes here]

And change any references to Jobsatisfaction to Jobsatisfactionsample (probably in a different file). The idea is to include everything we need to reproduce your example directly in the code, which includes the data.

mara · November 17, 2017, 8:47pm

Oh, I thought this was just a toy for reprexing!

christinelly · November 20, 2017, 8:36pm

Thank you so much Nick.

I endeavoured another reprex try but it is still not working. Pardon my ignorance, I am totally lost.
My data frame is called jobsatisfaction and I want the output to look like the image below (jobsatisfaction example) but adding the kable code for the formatting.
Do I need to load my data file in the same chunk? I got an error when I did.

I tried copying the code below and then running reprex() . The code is not even running so it is clear it is wrong.
According to what I understand from Jenny Bryan's video my data is self contained, I am loading the necessary libraries and creating the objects I am looking for @mara

library(knitr)
library(kableExtra)
library(ggplot2)
library(dplyr)
library(statsr)
library (tidyverse)
library (reprex)
dput(head(Jobsatisfaction, 10)) %>% 
Jobsatisfactionsample <- Jobsatisfaction %>% 
count(year, type) %>%
group_by(year) %>%
mutate(prop = n / sum(n)) %>%
select(-n) %>%
spread(key = type, value = prop) %>% 
arrange(desc(year))

mara · November 20, 2017, 8:57pm

It's not self-contained in the sense that anyone trying to reproduce what you're doing doesn't have your data (so, the dput(...) bit only works if you have the data to begin with).

However, the question you're asking is about formatting a table in the output document (in whatever form), so you don't even really need a reprex.

The table output options are varied, and there's no single package/option that fits all needs. The screenshot you have there is from viewing your data inside of RStudio, and will take some work to reproduce, if that's the exact aesthetic you're looking for.

There have been a couple threads about table formatting:

And there are some resources listed in those threads as well as here. It's definitely an area people are interested in working on, because it's not as easy as it could be. But, it's worth your time to go through the basics, and try different options, especially if you have a very specific end-aesthetic in mind.

@cderv's answer above shows you the code for using knitr kable. Currently you're loading the library, but not actually knitting your data into a table.

mara · November 20, 2017, 9:02pm

It also just occurred to me that you might not be in an RMarkdown document. I'm not clear on that from your scripts above. You definitely need to be in order to use any of the table layout packages.

christinelly · November 21, 2017, 10:25am

Hello Mara,

Thank you! I have received lots of useful advice for my formatting.

It is the reprexing that I do not understand. I see what you mean it is not self contained but I do not see what I am doing wrong. How do I then include the data? Do I need to run a summary output with the raw data and copy them into the reprex?

I am using Rmarkdown.

Thank you again for your incredible patience..!!!

mara · November 21, 2017, 11:39am

You can copy and paste the first few rows of your csv then enframe them using tibble::tribble. I'm on my phone atm. But take a look at the vignette for @milesmcbain's datapasta.
https://github.com/MilesMcBain/datapasta/

You don't need to use the package, but (if I recall correctly) it shows how to do just this!

nick · November 21, 2017, 5:47pm

Yes, the trick is that the data referenced in the reprex has to be available within the reprex. Since Jobsatisfaction is data that you created, R can't reference it unless we create it. The tribble function that @mara mentioned is a pretty straightforward way of doing it. When I previously mentioned dput, the idea with that is that you would use that to generate code that you can paste into a script.

Say I wanted the first 5 rows of the iris data (since it's also good to minimize how much data you include in a reprex). It's a built-in data set, but if it wasn't, I could do something like this:

> dput(head(iris, 5))
structure(list(Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5), Sepal.Width = c(3.5, 
3, 3.2, 3.1, 3.6), Petal.Length = c(1.4, 1.4, 1.3, 1.5, 1.4), 
    Petal.Width = c(0.2, 0.2, 0.2, 0.2, 0.2), Species = structure(c(1L, 
    1L, 1L, 1L, 1L), .Label = c("setosa", "versicolor", "virginica"
    ), class = "factor")), .Names = c("Sepal.Length", "Sepal.Width", 
"Petal.Length", "Petal.Width", "Species"), row.names = c(NA, 
5L), class = "data.frame")
>

I would then copy and paste the entire structure into the code for my reprex, assigning it to a variable.

my_iris <- structure(list(Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5), Sepal.Width = c(3.5, 
3, 3.2, 3.1, 3.6), Petal.Length = c(1.4, 1.4, 1.3, 1.5, 1.4), 
    Petal.Width = c(0.2, 0.2, 0.2, 0.2, 0.2), Species = structure(c(1L, 
    1L, 1L, 1L, 1L), .Label = c("setosa", "versicolor", "virginica"
    ), class = "factor")), .Names = c("Sepal.Length", "Sepal.Width", 
"Petal.Length", "Petal.Width", "Species"), row.names = c(NA, 
5L), class = "data.frame")

I could then reference my_iris in the code, and someone else using the code would have the same data that I did.

christinelly · November 21, 2017, 7:46pm

Hello,
Thanks Nick and Mara! I will try again later and let you know when I got it:)

(this is the chunk I am copying and running reprex()

reprex::reprex_info()
#> Warning in as.POSIXlt.POSIXct(Sys.time()): unknown timezone 'default/
#> Europe/Paris'
#> Created by the reprex package v0.1.1.9000 on 2017-11-21

library(ggplot2)
library(dplyr)
#> Warning: package 'dplyr' was built under R version 3.4.2
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(statsr)
library (tidyverse)
#> Loading tidyverse: tibble
#> Loading tidyverse: tidyr
#> Loading tidyverse: readr
#> Loading tidyverse: purrr
#> Conflicts with tidy packages ----------------------------------------------
#> filter(): dplyr, stats
#> lag():    dplyr, stats
library (reprex)
library(knitr)
library(kableExtra)

dput(head(gss, 5)) 
#> Error in head(gss, 5): object 'gss' not found

Jobsatisfaction <- gss %>% 
filter(gss$year %in% c("1982", "2012")) %>% 
mutate(type = ifelse(satjob =="Very Satisfied","Very Satisfied", "Other")) %>% 
select(type,year)
#> Error in eval(lhs, parent, parent): object 'gss' not found

dput(head(Jobsatisfaction, 5)) 
#> Error in head(Jobsatisfaction, 5): object 'Jobsatisfaction' not found
Jobsatisfaction %>% 
count(year, type) %>%
group_by(year) %>%
mutate(prop = n / sum(n)) %>%
select(-n) %>%
spread(key = type, value = prop) %>% 
arrange(desc(year)) %>% 
kable("html") %>%
kable_styling()
#> Error in eval(lhs, parent, parent): object 'Jobsatisfaction' not found

Jobsatisfactionsample <- Jobsatisfaction %>% 
count(year, type) %>%
group_by(year) %>%
mutate(prop = n / sum(n)) %>%
select(-n) %>%
spread(key = type, value = prop) %>% 
arrange(desc(year)) %>% 
kable("html") %>%
kable_styling()
#> Error in eval(lhs, parent, parent): object 'Jobsatisfaction' not found