A Technical Introduction to Yihui's personal website

Yihui Xie · 2017/04/25

Hugo theme Disqus Github Netlify blogdown multilingual knitr

If you are a beginner of blogdown and Hugo, I don’t recommend you to read the source code of my website. It was very heavily customized due to my OCD and the fact that I want to build an English blog, a Chinese blog, and a few project websites (like knitr and animation) together in one repository. Anyway, my Github repository is at https://github.com/rbind/yihui.

Theme

The base theme is called hugo-lithium-theme, which I have already heavily customized from the original theme, mainly to support MathJax and highlight.js out of the box. BTW, I used a git submodule to manage the base theme. I know you probably don’t want to continue reading this post now…

I overrode a lot of files of the base theme using the layouts directory. There are a lot of cool features,1 but they require substantial technical background knowledge to understand. For example, you may see how I load Disqus in partials/disqus.html. Basically I used the JS library disqusloader.min.js to lazy load Disqus, meaning that Disqus comments are not loaded until the comment area is visible in the viewport in your browser. Personally I don’t like Disqus,2 but there isn’t a better choice at the moment. I want my pages to load really fast, so I tried to avoid loading Disqus when unnecessary. Other features of this custom partial template include:

  1. All pages that lead to the 404 error (not found) will share the same Disqus thread;
  2. The comment area will never be loaded when the page is loaded in an iframe, e.g., in RStudio Viewer. This is because when I write new posts, I want the page to be loaded as quickly as possible, and I don’t want to wait for the Disqus scripts to be loaded;
  3. Although I enabled lazy loading, there is an exception: if a reader comes to my page following a link generated by Disqus that has a hash (e.g., /a-post/#comment-12345) pointing to a specific comment, Disqus will be immediately loaded so that the reader can see the comment right away;

My personal favorite feature is the Edit button on the toolbar under the title of each page (example). This was implemented in partials/article_meta.html. In Hugo, the templating variable .File.Path gives you the source path of the page, which you can use to compose a Github link for your visitors to edit the current page and submit a pull request to you on Github. This is not too difficult to implement when your website is pure Markdown. However, if you generated a page from R Markdown instead, you will have to be careful enough to point the source to the .Rmd file instead of .md or .html. I used a YAML option from_Rmd to indicate if a page was created from R Markdown.

Again, out of OCD, I hated one thing about Hugo: the baseurl in config.toml. I want to avoid hardcoding the base URL to https://yihui.name, because I want the pages to be portable and not tied to a specific domain. However, I feel most Hugo themes seem to use this URL by default. Actually it is not the problem of baseurl, but the templating variable .Permalink, which contains the base URL. I strongly recommend you to use .RelPermalink whenever possible, which does not contain the domain. Usually you only need .Permalink in RSS feeds and sitemaps. I have changed most instances of .Permalink to .RelPermalink in the default theme and also my own layout files.

Hugo 0.20.x changed .Content to .Summary in the default RSS template, which annoyed me because now you can only read a brief summary of your post in your RSS reader, and I prefer having the full content in the RSS feed (although it is not recommended by the RSS specs). For that reason, I provided my own RSS template.

config.yaml

My configuration file is config.yaml instead of the more common config.toml. The vast majority of Hugo websites use TOML instead of YAML, but I cannot see any point of inventing yet yet another markup language TOML in the case of Hugo websites. YAML is bad and confusing enough. Now we have TOML, a new standard. Sigh.

Content

As I said, I actually have multiple websites under https://yihui.name, so my content directory may look odd. I write in both Chinese and English, so you probably think the Hugo multilingual mode would be the natural solution. No, it is not. I had to use weird tricks like setting slug: cn/about in cn-about.md to make sure the Chinese About page is published under the /cn/ directory, although the Markdown source file is under the root directory.

The animation and knitr websites are relatively straightforward, and the most tricky thing is to create a page to automatically list all examples. This is achieved via the _default/example.html template. If you are familiar with Hugo, you might think a list page should be naturually generated using the template list.html. No, it is not that simple in these cases.

There are a few Markdown files under the root directory with the prefix pkg-*. These are automatically generated from some of my R package vignettes. I defined the Rmd source documents in a CSV file R/external_Rmd.csv.

The building method

By default, blogdown will generate a .html file for each .Rmd file. For me, I don’t want the HTML files but Markdown output files instead due to these reasons:

  1. Markdown will be cleaner than HTML in the source repository;
  2. I don’t want to host images generated from R code chunks in my GIT repository (usually I don’t like putting binary files under version control);
  3. I don’t really need the sophisticated features from Pandoc or bookdown.

So I defined my website building method to be "custom" in .Rprofile, and I call a custom building script R/build.R to compile all Rmd files to Markdown. This script calls R/build_one.R to build each individual Rmd to md using knitr::knit() only (instead of rmarkdown::render()), after adjusting some knitr chunk and package options. The Markdown output file is post-processed to add the from_Rmd option to YAML, protect math expressions so that Hugo won’t destroy them by accident, and so on. My R plots are saved under the static/ directory locally in Dropbox, but I ignored them in GIT. Eventually they are hosted via Updog.co directly from my Dropbox folder, and that is what the substitution by sed does in R/build.R.

Hosting

My website is hosted on Netlify using its free plan, which is enough for me. My Netlify subdomain is yihui.netlify.com, and I pointed the CNAME record of yihui.name to it. Because I want to use the apex domain (without www), this is actually quite tricky. In theory, you cannot set a CNAME record of an apex domain, but Cloudflare provided a workaround (more info).

Conclusion

The configurations of my own website are quite complicated, and I’m probably the only one who knows how to do a full build. There are many many little features that I didn’t mention in this post, such as how I center images automatically on a page. My overall goals are:

  1. The source should be as clean as possible (e.g., Markdown instead of HTML, no binary images in GIT, and so on);
  2. The pages should be loaded as quickly as possible (e.g., use async on <script>, and avoiding loading JS libraries when unnecessary3);

Anyway, I’m extremely happy with what I have got so far. I have used many tools in the past, including WordPress and Jekyll, and ended up hating them. I have revised some of my websites over and over again but never been truly satisfied, such as the animation website. The current version is the fourth version over the ten years (care to know the first version I built in Dreamweaver?), and I feel I have finally reached the destination. Similarly, I stopped blogging for a long while, because I felt so uncomfortable with creating a new post in Jekyll. Now I have created an RStudio addin to do that. The command line jekyll new and hugo new are just a few tiny steps away from being perfect! I cannot say the RStudio addin “New Post” is perfect for everyone, but at least I feel addicted to blogging again.


  1. If you do not have OCD like me, they may not really be important.
  2. For two reasons: (1) it does not support Markdown and does not seem to care, either; (2) it loads a huge amount of stuff that I don’t know.
  3. For example, when there are no math expressions on the page, do not load MathJax. Similarly, do not load Chinese web fonts when the page is not only for Chinese readers.