Using the C Preprocessor as a Templating Engine

Watch as it falls apart right in front of me.

By Hemisputnik, 26th of August, 2025.

I've wanted to make my own blog for about a month now.

At 512b, we have a blogging engine called 512blog, created by Syn to generate its and Mjm's blogs using Markdown.

Personally, I like writing stuff in HTML. Markdown is nice and all, but I only use it when I'm writing down a first draft. Unfortunately, writing raw HTML tends to get tedious when you're following a specific template.

So, in the spirit of 512b, I decided to try using the C preprocessor for my blogs.

# Wtf is the C preprocessor

Ever seen those lines that start with a # in C source code?

#include <stdio.h>

#define PI (3.14159)

int main() {
    printf("The value of pi is %.2f", PI);
    return 0;
}

Technically, those aren't part of the core C language. They're part of the C preprocessor.

The majority of what the preprocessor does is replace text (or expand macros). For example, in the code above, the macro PI is expanded into the string (3.14159).

The preprocessor can also paste other files into the source code. In the code above, it pastes the contents of stdio.h into the file. The file includes various function declarations, such that the compiler knows which functions you're referring to (as in the case of printf).

Believe it or not, the C preprocessor is actually quite good for my use case:

It can expand macros, such as the blog title, date, etc.
It can include other files, which makes defining various components super easy. Technically macros can also do this, but it's convenient to define components in a separate file.

What!!! Isn't the C preprocessor ONLY supposed to be used for the C programming language?

Nothing's stopping you from invoking the C preprocessor directly with cpp.

# Baby's first macro expansion

Let's try to preprocess an HTML file.

$ cat hello_world.HTML
#define TITLE hello, world!
<!DOCTYPE html>
<html>

<head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <title>TITLE</title>
</head>
<body>

</body>
</html>
$ cpp hello_world.HTML -o hello_world.html
$ cat hello_world.html
# 0 "hello.HTML"
# 0 "<built-in>"
# 0 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 0 "<command-line>" 2
# 1 "hello.HTML"

<!DOCTYPE html>
<html>

<head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <title>hello, world!</title>
</head>

<body>

</body>

</html>

⁉️

It worked, but... Why is there random junk at the start?

# Getting rid of the random junk at the start

So, the preprocessor is actually invoked from inside of the compiler most of the time, so it outputs a stream of binary tokens, and directly tells the compiler other important information (such as which macros were expanded, and from where).

However, it was made possible to run the preprocessor as a standalone program. Now there's a problem: how will the preprocessor communicate the important information to the compiler?

These strange lines in the beginning are called linemarkers, and they were generated because the preprocessor thought we were going to run a C compiler afterwards.

After a quick look at the manual, we can find the flag to disable these:

-P  Inhibit generation of linemarkers in the output from the preprocessor.  This might be useful when
    running the preprocessor on something that is not C code, and will be sent to a program which
    might be confused by the linemarkers.

This is our command now:

$ cpp -P hello_world.HTML -o hello_world.html

The preprocessor also discards C comments, so we also might want to disable that, since HTML has its own type of comments.

-C  Do not discard comments.  All comments are passed through to the output file, except for comments
    in processed directives, which are deleted along with the directive.
    [...]

-CC Do not discard comments, including during macro expansion.  This is like -C, except that comments
    contained within macros are also passed through to the output file where the macro is expanded.

$ cpp -P -CC hello_world.HTML -o hello_world.html
$ cat hello_world.html
/* Copyright (C) 1991-2025 Free Software Foundation, Inc.
   This file is part of the GNU C Library.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Lesser General Public
   License as published by the Free Software Foundation; either
   version 2.1 of the License, or (at your option) any later version.
[... snip ....]
/* wchar_t uses Unicode 10.0.0.  Version 10.0 of the Unicode Standard is
   synchronized with ISO/IEC 10646:2017, fifth edition, plus
   the following additions from Amendment 1 to the fifth edition:
   - 56 emoji characters
   - 285 hentaigana
   - 3 additional Zanabazar Square characters */
<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <title>hello, world!</title>
</head>
<body>
</body>
</html>

Okay brochacho???

We can get a hint as to what is happening by looking at the generated linemarkers from one of our previous mistakes:

[...]
# 1 "/usr/include/stdc-predef.h" 1 3 4
[...]

Ah. It is including this header, and pasting all the comments from it into our HTML file.

We can simply order it to not search the standard system directories (again, we're not preprocessing C source code, so we have no use for that).

-nostdinc
    Do not search the standard system directories for header files.  Only the directories explicitly
    specified with -I, -iquote, -isystem, and/or -idirafter options (and the directory of the current
    file, if appropriate) are searched.

Let's try it now:

$ cpp -P -CC -nostdinc hello_world.HTML -o hello_world.html
$ cat hello_world.html
<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <title>hello, world!</title>
</head>
<body>
</body>
</html>

Success!

# Making the system run the command for us

Running cpp -P -CC -nostdinc <whatever>.HTML -o <whatever>.html for all your blog posts is kind of tedious.

If only there was a tool designed for """make"""ing stuff using commands by writing your steps in a """file"""...

So the first solution I considered was a simple bash script:

$ cat build.sh
#!/usr/bin/bash
for file in *.HTML; do
    cpp -P -CC "$file" -o "${file%.HTML}.html"
done

This will work wonderfully out of the box, but it regenerates all posts regardless of whether they have been changed or not.

Now, this is not an issue whatsoever in the 21st century, as it will complete instantly (even if you have hundreds of blog posts).

On the other hand you have Makefiles, which are slightly more complex to write, and also solve that specific nonexistent issue.

# Writing a Makefile for generating blog posts

Okay, actually Makefiles aren't really that complex to write - they're quite straightforward:

target: source1 source2
    command -o target source1 source2

# OR, using the funny special variables which make literally zero sense:
target: source1 source2
    command -o $@ $^

Let's write one for our blog post generator thing:

.PHONY: all
all: hello.html

%.html: %.HTML
    cpp -P -CC -nostdinc -o $@ $<

Let's also add a recipe for cleaning the generated HTML files:

.PHONY: all clean

clean:
    rm -f *.html

$ make
cpp -P -CC -nostdinc -o hello.html hello.HTML
$ ls
hello.html  hello.HTML  Makefile

Great! We're done, and now we don't have to touch the Makefile ever again.

# Subtle foreshadowing

Now that we have a build system, we can start making components to include in our posts.

Let's make a simple component to test this:

$ cat component.inc
my wonderful component

Now, with the power of the C preprocessor, we can:

[...]
<body>
#include "component.inc"
</body>
[...]

$ make
cpp -P -CC -nostdinc -o hello.html hello.HTML
$ cat hello.html
<!DOCTYPE html>
<html>
<head>
	<meta charset="utf-8">
	<meta name="viewport" content="width=device-width, initial-scale=1">
	<title>hello, world!</title>
</head>
<body>
my wonderful component
</body>
</html>

And it works!

Cool. Let's change the text to something different.

$ cat component.inc
my bizzare component
$ make
make: Nothing to be done for 'all'.

Huh? Oh. Time to change the Makefile again.

# Telling `make` which components our blog depends on

At first it seems like an impossible task, but then you realize that this problem is actually quite common in C as well:

#include "dinosaur.h"

int main() {
    draw_a_dinosaur();
    return 0;
}

How do we remake this source file when dinosaur.h changes?

Fortunately the wise orb-pondering GNU people have thought of this, and added the -M flag to the C compiler (and the C preprocessor):

-M  Instead of outputting the result of preprocessing, output a rule suitable for make describing the
    dependencies of the main source file.  The preprocessor outputs one make rule containing the
    object file name for that source file, a colon, and the names of all the included files,
    including those coming from -include or -imacros command-line options.

Okay, but how does this actually help us? Let's run cpp with this flag and take a look at the output:

$ # -MT = Rule target, -MF = Rule output file, -nostdinc or it will depend on a system include file.
$ cpp -M -MT hello.html -MF hello.d -nostdinc hello.HTML
$ ls
component.inc  hello.d  hello.html  hello.HTML  Makefile
$ cat hello.d
hello.html: hello.HTML component.inc

As the manual page suggests, it generated another Makefile called hello.d with a rule that describes the dependencies of hello.HTML.

We can use this by including this tiny Makefile in the main Makefile, which will update the dependencies of the file.

$ cat Makefile
.PHONY: all clean

all: hello.html

# A rule to generate *.d files from *.HTML files.
%.d: %.HTML
    #         Substitute the .HTML extension
    #         for a .html extension.
    cpp -M -o $(<:.HTML=.html) $<

# We leave this recipe as it was before, except we add %.d to the dependencies to force make to generate the .d file for it.
%.html: %.HTML %.d
    cpp -P -CC -nostdinc -o $@ $<

include hello.d
$ cat hello.d
hello.html: hello.HTML component.inc

Now, hello.html will be remade when either hello.HTML or component.inc change, and hello.d will be remade when hello.HTML changes so we can update the dependencies! Awesome!!!

Let's refine the Makefile a little. First, let's introduce a sources variable:

.PHONY: all clean

sources = hello.html

all: $(sources)

# [...]

Now, we can include ALL generated .d files (instead of including them one by one), by using funny Makefile syntax:

# [...]

include $(sources:.html=.d)

# Reaping the fruits of our labour

Now that we have set up the """templating engine""", we can ACTUALLY start writing posts.

Here's an example of a blog post might look like:

$ cat my_blog.HTML
#define BLOG_TITLE hello, world!

#include "blog_post.inc"

BLOG_POST_PREAMBLE

<h1>BLOG_TITLE</h1>

Blah blah blah blah.

<hr>
This post was compiled on __DATE__ __TIME__

BLOG_POST_EPILOGUE

I have set up the BLOG_POST_PREAMBLE and BLOG_POST_EPILOGUE macros, which use the BLOG_* macros defined previously. We can also use the built-in macros, such as __FILE__, __DATE__, __TIME__, etc.

# Where things go wrong

We are pushing the limits of the C preprocessor REALLY hard now. And it's starting to show.

I usually set the titles of my webpages to something like hsp // PAGE_TITLE (for style points), however we run into a peculiar limitation when we try to do that from within a macro:

$ cat my_blog.HTML
#define BLOG_TITLE my awesome blog post
#include "blog.inc"
BLOG_PREAMBLE
$ cat blog.inc
#define BLOG_PREAMBLE \
    <title>hsp // BLOG_TITLE</title>

<title>hsp /* BLOG_TITLE</title>*/

Right. This happened because we didn't read the fine print on the `-CC` option:

-CC Do not discard comments, including during macro expansion.  This is like -C, except that comments
    contained within macros are also passed through to the output file where the macro is expanded.

    In addition to the side effects of the -C option, the -CC option causes all C++-style comments
    inside a macro to be converted to C-style comments.  This is to prevent later use of that macro
    from inadvertently commenting out the remainder of the source line.

So as far as I know, there is no solution to this, other than just "don't use double slashes in blogs."

# Double quotes my behated

Remember those __DATE__, __TIME__ and __FILE__ macros we talked about? Let's expand one of them:

The current date is __DATE__

The current date is "Aug 25 2025"

Unfortunately, the date is quoted.

Oh, surely the C preprocessor has a way of unquoting strings!

The C preprocessor has NO way of unquoting strings! Fuck!!! Maybe using a tool made specifically for the C programming language as a templating engine for blog posts wasn't such a good idea!!!

# Okay maybe don't actually do this then

You can probably get away with using the C preprocessor for templating for a little while, but you'll eventually run into one of the above limitations.

I ended up using a more generic macro processor for my blog posts: GNU m4. It works just like the C preprocessor, except instead of being designed for C, it is designed for any kind of text.

Even if it didn't work out in the end, I hope you enjoyed reading this descent into madness, and learned something new.

This post was compiled on Wed, 27 Aug 2025 03:00:10 +0300 .
Go check out my other blog posts, or my friends' pages!