@chris.prener I totally agree with your critisism and it probably was a bit of a communication failure on my part... I did not mean he should make a general, reusable, CRAN-ready package out of his thesis. I like to abuse packages as project-folders on steroids. The effort for setting up a package is basically zero once you've done it a few times, and you get a nice place to specify your dependencies (DESCRIPTION), manage a NAMESPACE for your project, and you can work with all the awesome package development tools like testthat and usethis. As soon as you do something like sourcing a functions.R file, you are imho much better advised to just to make a package. That does not mean it has to be something you would want to publish (or even other people to see ;)).

1 Like

got yah @hoelk - perhaps a misinterpretation on my part as well! I think you're right - these are all great tools by themselves. I think my earlier advice is still what I would tell a PhD student like @durraniu - if you already know what a NAMESPACE is and have used tools like devtools, usethis, etc - this could be an excellent way to keep your work organized. If, however, this is your first rodeo, better to leave packaging for a post-graduation exercise. Just my two cents here.

5 Likes

Wow! I wasn't expecting so many responses in a day. But this is all great advice, thank you, everyone.

One major (and unintended) side effect of my question was the LaTeX vs Rmarkdown debate. Personally, I have tried learning LaTeX a few times but not pursued it due to two reasons:

  1. It is not used in the engineering faculty at my institute, University of Windsor. So, my supervisor and other students are not familiar with it. Everyone uses MS Word.
  2. I have no idea how others focus on writing with all the backslashes on the page. This issue alone was enough for me to not keep learning LaTeX.

So, I will go with the Rmarkdown option.

Considering all the advice in your great responses, here's my new plan:

  • I'll explore bookdown and the recommended variants e.g. thesisdown in detail

  • I'm going to keep all of the analysis steps in separate script files. Additionally, I'd create plots/tables in a separate script. So, if I re-run any analysis file, I'd also re-run the plots script to get the updated figures. And if only a little tweak is required (e.g. plot theme) I'd simply modify the plots script. This will be the only script that I'll source in the R chunk in my thesis chapter(s)

  • For my supervisor's review, I'd knit to .docx. One issue would be handling the changes that he makes with version control in Word turned on. But I guess there is no automatic solution to that.

  • I'm definitely going to check out the rticles package documentation. I know that right now it does not have any templates for journals in my field (my research is in transportation engineering, a branch of civil engineering). So, I might need to create my own.

  • I've been using git and github in RStudio. Currently, I only know about commit, add, and push, which have been mostly what I needed. But it is definitely worthwhile to dig in more

Please let me know if there's anything you'd add onto the above list.

I have never created an R package. And at this point, I don't feel very comfortable to do that for my analysis. But I do try to follow Hadley's advice on creating a function if the same analysis is repeated more than three times.

I use Mendeley for bibliography, which works well. One thing that I like more in Word is to be able to easily resize figures using mouse. I know that I can provide fig.height and fig.width in global chunk options in Rmarkdown, but sometimes custom changes are required. Are there better options to handle size of tables and figures? Please let me know.

I'd love to see a real thesis repository that used Rmarkdown. Once again, thank you everyone who has responded.

1 Like

I read the good advice given here and I don't want to repeat them, but my formula was:

  • Bitbucket*: a large repo that got so large at one point that I had to learn about reducing repo size. So one of the things I kept thinking was whether it is better to have one repo per chapter or one big repo. Depending on the size of your files and thesis that might be an issue to think about. [*update: Github or GitLab would just do fine, but at the time I wrote this only BB had private repos and repo size is much less of a problem today, especially in GitHub. At the end of the day: git <3]

  • Rmarkdown, because I use R a lot. Each chapter was a subfolder with their own /data, /R /output sub-sub-folders. Remedy is a great RStudio addin to assist during writing.

  • One nice word template, in my case @jhollis 's (https://github.com/jhollist/rmd_word_manuscript and hereby I thank him for it) that I modified according to my needs. This is because coauthors, advisor, and school were expecting a word document to revise.

A problem I found was with tables that needed better formatting, and sections in word that needed to be horizontal - I had no choice but to format them -and introduce page numbers- manually but that's pretty much the only part of the word document that I formatted manually.
I control figure size by saving the file and including it later, manually either with ![]() or with knitr::include().

  • References were directly formatted in rmarkdown - I prefer zotero and love its new option of saving automatically a .bib file from a collection in the repo folder, and cite using [@xx] (I guess that's pandoc + knitr) [update: BetterBibTex in Zotero to keep the .bib file updated and get cite keys. Unfortunately there is no good citing addin or package that I know of]
6 Likes

All sounds great, but I think I'd caution you against only sourcing your plots file - one of the great things about RMarkdown is not only integrating figures, but also summary variables, coefficient values, p-values, etc. via variables in your environment too. You want whatever scripts you're sourcing to give you access to those so that, whenever you have to update any aspect of your analysis, all your reported values get updated as well. I like the idea of having your scripts modular, but I wouldn't want to sacrifice access to other variables relevant to your analysis report. One option, to keep it modular, might be to source the analysis script within the plot script to maintain access to all your analysis variables. Others may have better ideas too.

EDIT: Also, as an afterthought, regarding better formatting for tables - I just had a discussion about this with a colleague, and she recommended looking into the flextable package in combination with captioner for better control over table formatting when compiling to .docx.

3 Likes

Its worth reading this thread on organizing documents in a data science project... I suspect there are concepts which are very helpful for you as well.

2 Likes

Oof, yeah.

FWIW, I learned LaTeX for my honours thesis and that was a Bad Idea. I'm already an okay coder, so I was comfortable learning new languages, but tearing your hair out three weeks before submission because your thesis won't compile sucks. LaTeX (or even, to some extent, RMarkdown/Markdown) is for people who don't puke when they hear the question, "What if writing was more like programming?"

On the other side, I've written manuscripts in Word because that's what my colleagues were using, and sending references from Zotero was a bit of a struggle, especially with revisions (and that's before you bring extremely variable journal submission systems into it). I'd say learning to use a good reference manager is a pretty universally important skill in academia.

I think pure Markdown based on Pandoc is a good compromise in terms of readability and "getting straight into writing"—once it's set up—but I think there's a lot of room for tools that abstract the learning curve away specifically in academic contexts (thesis, papers, etc.).

Services like Overleaf and Authorea are also trying to solve that problem, but they have their own limits and often ongoing pricing. I think Authroea's a solid choice for papers, but it had some limits around thesis structure when I last tried it (though that might've changed). It does allow 1 private document at a time for free and has git integration underneath, which is pretty rad.

EDIT: it looks like Authorea is thinking about how to better accommodate a thesis, but they're not quite there yet (and they want you to publish each chapter as a separate article, which either means coughing up for a sub or being able to make your chapters public as you finish each one).

3 Likes

I will never get into the debate of Latex vs markdown. However, as I mentioned, you really should try Lyx, it will make your academic life much much easier.

1 Like

First things first: I wrote (& submitted/defended! :tada:) my own dissertation in RMarkdown in May 2017.

You can find the public version of that here, including a README:

I say the "public version" because the second chapter of my dissertation involved private student data that is under a restricted publishing agreement -- I can't publish the text of that chapter for a while (though I did include the filenames / a few references to that chapter, so it's not totally obscured).

Also note that there are lots of idiosyncrasies to how my institution requires a dissertation to be formatted. I bootstrapped heavily on an existing LaTeX template that was found "in the ether" among my peers -- caveat that I can't promise this will translate well to your own situation.

That said, a few broad takeaways:

  • Since my chapters came from really different places (one with one set of co-authors [in raw LaTeX], another of my own [Rmd], and a third with a different set of co-authors [Rmd]), a lot of the effort in converting things to Rmarkdown involved wrangling these disparate pieces together in as programmatic a way as possible. aggregator.sh script is what pulls these from their own project into the dissertation folder
  • Dissertation formatting is the worst. So many little things... the whole body_extractor.R script consisted of finding something that was compiling OK elsewhere and all-but manually tinkering with it until it was fixed & presentable in dissertation format. See convert_sideways helper function for converting sidewaystable tables back to vertical
  • I managed references through the individual papers' .bib files, and just concatenated them, basically. This may not have been ideal... liberal use of \cite within the papers also went far.

I'm starting to rust a bit on all the details, but happy to follow up on any questions you have.

7 Likes

I see that thesisdown (the one that started it all) and huskydown (my contribution to this topic, and used by more than a few students on campus) have already been mentioned in this thread, and these are worth a close look because they take care of a lot of the fuss of formatting because they have beautiful built-in latex templates.

Here are a few other posts and packages relevant to this topic that might help you decide how to proceed:

My vote is for R Markdown and RStudio for thesis writing, plus another text editor that has good on-the-fly spell-checking (spacemacs is my choice), and Zotero for free and open source reference management, and because of great integration with the citr addin). I also agree with making an R package as a thesis, as noted above.

10 Likes

I have been writing in LaTeX for about thirty years. I like it. I don't really know markdown well enough to say.

I have written my thesis in LaTeX and R. It's in food science (honey wine to be specific) and my PI requires me to send them new content in Word first, because they don't really want to deal with LaTeX. But I really, really like LaTeX. It helps that there's a document class created specifically for my school's thesis style.

Use what you know, what you want to know, and what you want to get to know very very well. Good luck!

1 Like

Just wanna say thanks to all the awesome contributions on this thread. I've been using rmarkdown recently (a bit of org in the past with emacs) and I can see there are terrific options to work with text, code, and output withing R framework
cheers
Fer

2 Likes

It comes down to a few issues. 1) Will the graduate school accept the format? 2) Can you use a template that the graduate school provides? If not you might find yourself reprinting the dissertation several times for small issues like the margins are too small/large or something is shifted 1/2 mm out of place. Yes, the graduate school will reject the dissertation for something so "minor." 3) Will your advisor and committee accept the document and be able to edit and provide comments. 4) How will you deal with citations? Bibliography programs like Endnote will save you time. They track citations and remove/add them as you edit your documents. Otherwise you are responsible for reading through your document to make sure that all citations are present in the literature cited and all elements in the literature cited are actually cited. There is also the issue of formatting for the dissertation, then reformatting for a journal. If the manuscript is rejected then formatting for a new journal. If you have many citations, or expect to be publishing more articles then learning and using a bibliography program is essential. 5) Will the journal accept the file format?

2 Likes

You'll want to check with your supervisor and the Faculty of Graduate Studies. Most universities have strict rules on formatting. This includes things like font types and sizes for different heading levels, the format to be used for the table of contents, a list of tables, and a list of figures, etc. Maybe you ought to ask FGS if they have Rmarkdown or latex formatting templates, and use whichever system will mean less work for you.

I think most universities will have latex style sheets for you to use.

If your work is at all technical, you'll be using latex within the Rmarkdown for equations and symbols within the text, so you ought not to find a transition from Rmarkdown to latex to be especially bothersome. Where you once wrote # blah you will now write \section{blah}, etc. You could write a one-page trial in Rmarkdown, and convert it to latex to see how it works, and then all you'd need to do is to insert a bunch of invocations like \degreetype{PhD} and \convocationDate{blah} so that the first few pages are formatted properly. Any university that provides a latex style sheet will probably also provide a sample file, and that makes it a lot easier; where it says \author{insert author name here}, just ... do what it suggests.

1 Like

Yet another resource... a series of blog posts on exactly this, but 2 year old now:

I personally write mine in Emacs org mode (with babel for code chunks). Not nearly as popular as RMarkdown which really took off the past few years because it involves Emacs, but it is a more developed markdown format and allows to insert raw LaTeX chunks whenever you need to do something really specific that only LaTeX allows. If you were at all interested in hearing more about this, I could share some info on it.

5 Likes

Not sure how the new Radix will fit into this, but it might be worth having a look:

https://blog.rstudio.com/2018/09/19/radix-for-r-markdown

1 Like

Good catch! I'm excited about Radix but it relies on RStudio v1.2, which (to the best of my knowledge) is only available through preview releases. I don't use previews, so I haven't had a chance to try Radix yet, but I suspect it would be fantastic for writing scientific theses!

hi

RStudio v 1.2 is available on its cloud platform https://rstudio.cloud. It good opportunity to kick the tire.

I am sold on Radix cos of a good review here. I intend to use Radix to showcase some pet project I did when applying for data jobs.

pandoc is the Swiss Army Knife of document format conversions and is under the hood of RMarkdown/knitr. But it's also available standalone and you will get the ability to convert between a wealth of formats. Sometimes you may need to tweak, of course, but anyone needing to go LaTeX => HTML, for example, should have it.

https://goo.gl/0qipE0

2 Likes

I would disagree with this, but only slightly. I wrote my thesis using RMarkdown and I found it was much easier to write than using Overleaf or ShareLatex for the simple reason that once you have your settings setup there is less code to produce your text (makes it easier to review).

Once you refer to the proper cls document (which my university provided) and write your preamble in latex rmarkdown will compile the document and you can even save the .tex file.

Just my 2 cents!

2 Likes