How and Why I Migrated this Website

Where I give less of a technical explanation and more of a slight rant on slowly moving away from big tech/proprietary solutions and modern software. There are some notes on the technical “how”, which are to be taken more as an inspiration than as a tutorial.

Reasons for Change

How I Used to Do

Like many other devs, I followed the fairly standard strategy of using a static website generator (in my case it was Hugo) and hosting on GitHub pages via an action that triggered the build after a push to the main branch. I will not get into details, this is all very well explained, both on Hugo’s and GitHub’s docs.

What was Wrong with It

Technically

There were several things I disliked about this, not least of which was the relatively indecent level of hidden/abstracted complexity (from the amount of tools and supporting architecture required rather than user/dev-facing perspective) behind my very simple needs. Said needs were, as follows:

I did not need:

There was essentially a profound mismatch between the power that Hugo offers (and thus its abstractions) and my absolutely basic needs (apart, maybe, for the mathematical part).

Add onto that the fact that I was also relying on GitHub actions to generate a website I had already generated locally, from scratch, everytime. This only adds to the “hidden complexity” I was hinting at earlier. Realistically, it is deeply ridiculous for a blog of this one’s size to rely on a workflow as sophisticated as GitHub’s continuous integration. I need to make a few html pages and push them onto a server, I don’t really need to be precious about versioning…

Ideologically

This is a topic that could be a post (or several) on its own. There are essentially three aspects to it:

  1. Still using GitHub, at this stage, feels like:
    • Reinforcing Microsoft’s hegemony and feeding an all-consuming monster who wishes to control any and all ways to use computers;
    • Implicitly giving my consent to Microsoft to do as they wish with my data, including the training of models whose impact on software has been mostly terrible thus far (and my GitHub code won’t make that any better, believe you me!);
  2. More globally, from a European perspective, still being reliant on American solutions and data centres, is more problematic than ever. Codeberg is offering a European alternative which strives to champion FOSS and is an initiative I stand behind. But an alternative won’t succeed if nobody uses it;
  3. Using easy and ready-made solutions such as Hugo for the website generation are great from a user perspective, but I find myself realising I don’t really learn (or understand) much of the fundamentals by using tools that abstract everything away for me.

How I Generate this Website Now

The first step for me was to find an alternative to Hugo. It handled a few things for me that required finding alternatives:

  1. Obviously, generating html pages from markdown files;
  2. Rendering maths (Hugo used MathJax by default iirc, but I used to use Katex);
  3. Watching for save to source .md files and regenerating content;
  4. Serve on a localhost;

Points 1 and 2: generating html files

I considered writing my own solution to this for a moment. I did not think that parsing minimal markdown and generate the approriate html would have been very hard considering what little I needed. I thought of leaving the maths untouched and using Katex and be done with it.

Then, I was made aware that Katex, MathJax, and (obviously) generated images of maths formulas make them inaccessible to screen readers, and that a much better alternative was MathML. The idea of having a website free of JavaScript, and where maths was accessible and semantically embedded in the source was particularly enticing. Writing in MathML directly or writing a tool to convert LaTeX to MathML? Much less so.

Enter the one real dependency of my website: pandoc!

Want to generate an output.html from an input.md file, turning you ugly LaTeX into horrifying MathML?

pandoc -f markdown -t html --mathml input.md -o output.html 

I also needed:

And that’s achieved just as easily as:

pandoc -f markdown -t html \
       --mathml \
       --highlight-style=zenburn \
       --toc \
       --template=template.html \
       input.md -o output.html

Remark : I have to admit relying on something as big as pandoc (or a LaTeX compiler) is pretty annoying, esp. given part of my reasons for migrating. I don’t see a better solution to both generate html from markdown at a minimal cost and “compile” maths snippets to html. If you know of one, do let me know.

Now obviously you don’t want to have to do that for each and every post individually (although why not, if you write as little as me?). I wanted to automate the website generation fully, so I did it as follows.

First, I structured my “source” like so:

sources/
|-- content/
|   |-- blog/
|   |   |-- post1/
|   |   |   |-- asset1.png
|   |   |   \-- index.md
|   |   |-- post2/
|   |   |   |-- asset2.png
|   |   |   \-- index.md
|   |   \-- index.md
|   \-- index.md
\-- template.html

so as to easily produce a static website that mirrors this structure as:

public/
|-- blog/
|   |-- post1/
|   |   |-- asset1.png
|   |   \-- index.html
|   |-- post2/
|   |   |-- asset2.png
|   |   \-- index.html
|   \-- index.html
|-- index.html
\-- template.html

This can easily be achieved with any scripting language, copying over or syncing the assets and using pandoc to generate the html pages where needed. You can also check against the last-written times to handle incremental builds.

I used a make with a Makefile to do that, but again, that’s pretty overkill. I will probably just take the time to write a small bash script instead at some point.

Point 3: automatically update the website on save

There are many ways that that could be achieved. One thing I thought about was just locally changing my editor’s save shortcut to also call make after save, but I decided against. I dont want my editor to be responsible for this, especially if I want to write and save without rebuilding every time.

A good Linux utility for that is entr. You can pipe your .md files’ paths (through the use of ls or find for example) to entr and have it call ‘make’ so your website is built everytime a save occurs. This could also be a place to force a web-browser refresh if you are also checking the output on localhost.

Point 4: serve the public website on localhost

There is not much to it there, python saves the day once more. Just:

python -m http.server    

and open your favourite browser on http://localhost:8000.

Again, maybe I should look into doing it from scratch just to learn a little bit about how this is achieved, but this is pretty low on my list of priorities of things I am interested in.

What does the Deployment Look Like?

The last thing to do was to get off GitHub and onto Codeberg and, additionally, not rely on actions and just push the public/ directory to be served on Codeberg pages.

There are several valid approaches but I decided to keep the generation side as its own private repo and to have a public pages repo which only contains a copy of the public/ directory (plus the .git/ files). So the last piece of the puzzle was just to write a small script in sources/ that would copy over the changes in its local public/ directory to the served one, and that would cd into it to commit and push automatically, with a prompted commit message.

It’s overall pretty primitive, but it works, saves me a lot of typing, and keeps the generation on my end rather than offloading it god knows where at god knows what cost.

What’s Missing?

A few quality-of-life features are going amiss. Most notably, it would be comfortable to have the blog/ index page be updated automatically whenever a new entry is added. There are definitely ways to do that simply by leveraging the YAML headers in the .md source files so as to pass info such as date, title, abstract etc… and then use a Lua filter in cunjunction with pandoc. It’s something I will have to do at some point, especially if the post count grows large enough that it necessitates separating the blog index into several pages.

Another loss from my method is that with my current setup, it would be pretty awkward to try and update the blog if I didn’t have local clones of both the source and pages repos. Say I spot a typo, want to edit quickly straight from the Codeberg interface. Well now I can’t generate the html and I have to also edit it on pages so as to keep things in sync, or just accept that I need to push as soon as I get back on my machine. This can also be adressed but would defeat the purpose of not relying on remote tooling more than is necessary. Realistically, I only write on my machines anyways.

In the same vein, my solution only really works for me. A command such as make is ubiquitous on Linux boxes, so it makes sense to use it. A Windows user trying the same method may, rightly, find that installing a Linux subsystem or mSys2 or what-have-you just to build a blog is peak insanity.

Lastly, although I like the accessibility and lightness of MathML, its rendering is really not the best. I will have to explore options that remain JS-free, whilst still having a more visually pleasing maths display. This probably means generating .png or .svg of the maths and linking to it whilst keeping the MathML as a fallback or description for screen readers. This is at the cost of more memory, which is probably not a load I am willing to put on Codeberg. Maybe if I end up self-hosting…

What Now?

This migration was just part of me wanting to overall be more involved in the FOSS space. I don’t really have the time or the inclination to produce good code outside of work, so I have been pretty shy about showing it online, or contributing to actually useful projects. That being said, I can also see the pedagogical value of having some such snippets on a platform that advocates for freedom and openness.

I will be migrating my scrappy code from my private GitHub repos and make them public on Codeberg, and try to give them some much needed TLC. I will also try and contribute more to this blog. At a time when the answer always seems to be to throw compute at a problem, and use more and more abstract machinery, I feel it’s valuable to be reminded that sometimes, more “primitive” or lower-level solutions are not only more efficient but also much simpler to comprehend.