DIY static CMS using Node.js

2017-05-25

There is no shortage of existing tools for creating web content, but I have struggled finding the right tools and workflow that meet my requirements. I'd like to

create content easily
have direct control over content structure
maintain content fairly easily
have a toolset that's not bloated from the get go
have room to extend and build on as need arises
Be able to host content anywhere

All items pretty much rule out classical DB-backed CMS. I want to be able to create contents within my text editor, and I want to be able to manage most tasks directly from it. The obvious choice for creating webcontent such as this is to directly work with static HTML/CSS/JS content. But manually creating everything from scratch and maintaining contents on each page quickly becomes a hassle. The first thing that I initially considered was org mode — org-mode is excellent at creating and organizing text. Org-mode also supports exporting the generated text to various formats, including HTML. See user documents for Ivy Mode as an example of conversion between org and HTML. Admittedly I never invested the time to master org-mode for two main reasons:

If an org-file content is at some point going to be shared with others, it may not be as accessible to them since not everyone uses Emacs,
Org-mode very quickly it comes its own verbose language. If a document will eventually be rendered through a TeX engine and presented as a PDF, I personally find it easier to write it up directly in TeX format as opposed to write it in org-mode.

Static Site Generators

Static site generators mitigate the forementioned problems, and there is no shortage of those either. The most popular one these days seems to be Jekyll which looks very promising. I haven't actually used it, but it looks great: combining markdown with templates and HTML by itself satisfies almost all of my requirements.

One can of course use python backed by template system modules that exist for it like Jinja. But since we are talking about web content, it is probably worth it to also consider using Node.js as the base tool for creating personalized workflow that meets my needs.

I have been considering it as the base tool for managing content for a bit and it doesn't seem like a bad choice:

Node.js itself has a massive large community that's active
Similar to python, it comes with a decent package manager
It is flexible
Asynchronous approach seems to be a plus for this specific use case.

And for those starting out with no past exposure, it's fairly easy to be up and running quickly, even when starting from scratch. See for instance an article describing creating an MVC framework from scratch using Node.js.

There is no shortage of template systems for it either. See e.g. Mustache, handlebar, nunjucks, ejs, pug (formerly jade). When I started using it, the most time consuming part was infact deciding on which template system fit my needs.

Node.js isn't all sunshine and rainbows of course. There are few things that I do not particularly enjoy about it such as how the solution to any problem is "use this third party module", and considering how each module also pulls in 20 dependencies, the dependencies can quickly become bloated for even simple use cases. Can you trust or rely on this module further down the road? Is this module going to be a security vulnerability down the road? Or how rapidly best practices and modules change and evolve (perhaps the two are directly related).

Let's walk through a minimal candidate workflow for creating static sites using Node.js.

Minimal Example: Code Snippets

Here's a task that I want to accomplish: establish a simple workflow for creating and managing simple web-content for personal use, some of which may include code snippets and simple JS application.

To keep things simple and less obscure, I will only use two modules here that do not come shipped with Node.js:

Nunjucks: A template library maintained by Mozilla (essentially a port of Jinja)
Prism: A lightweight syntax highlighter

By the way, the code highlighter can be eliminated as a dev dependency and included as a front-end dependency, potentially bringing down the number of direct dependencies down to 1. Nice.

The initial setup is trivial:

Install Node and NPM (fedora/centos/rhel systems)

# yum install node npm

Create base directory

> mkdir my_site && cd my_site

Install modules

> npm init -f && npm i -D prism nunjucks

Quick overview of what just happend:

npm init: designates the current working directory as a Node.js and manages dependencies and script.
npm i -D prism nunjucks: Fetch and install as Development dependencies packages prism and nunjucks from NPM repositories in the current project directory.

That's it. Simple. Now we can actually start. But there options. Two different routes can be taken:

Option A

The easiest approach that requires absolute minmial work is to delegate code highlighting to the client.

Create an Nunjucks template file


            option-a-template.njk

<!doctype html>
<html>
  <head>
    <script src="prism.js"></script>
    <link rel="stylesheet" type="text/css" src="prismjs.css">
  </head>
  <body>
    [...]
    <pre class="language-js"><code> {% include code_snippet.js %} </code></pre>
    [...]
  </body>
</html>

The prism.js and prism.css files comes with the prism module. Serving these directly can be avoided and instead by using a CDN.
The include directive will fetch the content for us when rendering the page (see below).

Write the code snippet you want to present in its own file (e.g. code_snippet.js.)

Write the script for rendering the template in its own file (say build.js)

'user strict';

// templating system
var nj = require('nunjucks');
// I/O
var fs = require('fs');

// Configure template generator
nj.configure(__dirname, {autoescape: false});

// Render the HTML from template
var html = nj.render('option-a-template.njk');

// Write the generated HTML to file
var outputFile = 'option-a.html';
fs.writeFile(outputFile, html, function(err){
    if (err){
        throw err;
    }
    console.log('Wrote ' + outputFile);
});

Create the static content and write to file
```
> node build.js 
```

Nunjucks will chuck our code snippet into the HTML code when it is "built", and prismjs.js will highlight the code when viewing it in a web-browser. Of course, there is no need to write the content to a file: it can be served directly by creating an http server if we choose to. But of course, the goal is to have a static CMS, so writing to a file it is.

What I like the most about this is that the code snippet lives in its own file, and if it is complete, it can be ran and tested it by itself and not have to worry about forgetting to reflect any changes in the HTML file, and (unlike going from directly from org-mode to html or markdown to html) a complete control over the structure of the document is preserved.

Option B

Let's make this a little more interesting. In previous example, the rendering is done client side prism.js once the page loads. The file is already being fetched and brought it into the HTML file, so why not directly write the tokenized code snippet instead of its content? Whether it's a good idea or not is a different story, but this is easy to do with Node+prism:

Write a script to create the tockenized HTML code from an input file. prismjs's expose API can be used to generate the HTML directly from file. Let's write it as a separate module:

var prism = require('prismjs');
var fs = require('fs');

// Generate HTML given a 'snippet' in 'language' syntax
exports.generate = function(snippet, language){
    // Generate html code
    var html = prism.highlight(snippet, prismLanguage(language));
    // Wrap in <pre><code>
    html = '<pre class="language-"><code class="language-">'
        + html + '</code></pre>';
    return html;
};

// Generate HTML content from file
exports.generateFromFile = function(filename){
    // Use file extension to guess language
    var periodIndex = filename.lastIndexOf('.');
    var ext = '';
    if (periodIndex > 0){
        ext = filename.slice(periodIndex+1, filename.length);
    }
    var source = fs.readFileSync(filename, 'utf8');
    return exports.generate(source, ext);
};

/* Helper function for getting prism language tokens */
function prismLanguage(language){
    if (language == 'javascript' || language == 'js'){
        return prism.languages.javascript;
    }
    /*
     * [...]
     */
    // Default to markup
    return prism.languages.markup;
};

Nunjucks lets us directly make custom function calls from within the template. Update the contents of the template

<!doctype html>
<html>
  <head>
    <!-- No JS here -->
    <link rel="stylesheet" type="text/css" src="prismjs.css">
  </head>
  <body>
    [...]
    {{ codeSnippet('option-b-template.njk') }}
    [...]
  </body>
</html>

Write a script to generate the file (say build.js)

// template system
var nj = require('nunjucks');
// I/O
var fs = require('fs');
// Our highter module
var ch = require('highlighter');

function codeHighlighter(filename){
    return ch.generateFromFile(__dirname + '/' + filename);
}

// Configure nunjucks
nj.configure(__dirname, {autoescape: false});

// Render the page
var html = nj.render('option-b-template.njk',
                     {codeSnippet: codeHighlighter});

// Write the generated HTML to file
var outputFile = 'option-b.html';
fs.writeFile(outputFile, html, function(err){
    if (err){
        throw err;
    }
    console.log('Wrote ' + outputFile);
});

Build as before

Now there is only one HTML file, and one CSS file. That's it. Oh, and the client doesn't need JS to view a page that should be static to begin with. If you're like me and disable JS by default (check out NoScript by the way), then our page will still render as expected.

Closing Thoughts

This is by no means a complete worklow. But this workflow, as awkward as it may initially look, seems to meet all my requirement: it is built on top of a powerful environment and package manager, individual content composition is delegated to a decent template framework (nunjucks), direct control over content layout is preserved, the overall workflow is still simple enough, and I can break content into separate modules (e.g. snippets can live in on their own), and create a complete custom build script to compile the site.

Of course this is not a complete or a general guideline for using Node.js as a CMS. There are many unaddressed problems. For example, what if I want to create and include a plot as an image file during rendering? The easiest way of creating the plot as an image file I know off is using gnuplot or Python's matplotlib module. But how would they be specified as a dependency of the project? This would require a two step process to be up and running when moving repos between machines: 1) install node/npm, 2) install Python and required modules. But you wouldn't know or remember that gnuplot or Python + matplotlib (to make it worse, which version) were a dependency until after you try to build it. But perhaps this issue is not related to the workflow, but the portability of the content itself. What would be the best way of creating persistent links to other content without breaking things when directory structures change? When these issues are addressed, will it result in a suboptimal solution to a problem that had already been solved more optimally?

These are questions I'll keep in mind moving forward, but the approach seems sustainable. In fact, as of this writing, this page was created using the exact approach described in the article.