By Hemisputnik, 26th of August, 2025.
I've wanted to make my own blog for about a month now.
At 512b, we have a blogging engine called 512blog, created by Syn to generate its and Mjm's blogs using Markdown.
Personally, I like writing stuff in HTML. Markdown is nice and all, but I only use it when I'm writing down a first draft. Unfortunately, writing raw HTML tends to get tedious when you're following a specific template.
So, in the spirit of 512b, I decided to try using the C preprocessor for my blogs.
Ever seen those lines that start with a # in C source code?
#include <stdio.h>
#define PI (3.14159)
int main() {
printf("The value of pi is %.2f", PI);
return 0;
}
Technically, those aren't part of the core C language. They're part of the C preprocessor.
The majority of what the preprocessor does is replace text (or expand macros).
For example, in the code above, the macro PI is expanded into the string (3.14159).
The preprocessor can also paste other files into the source code.
In the code above, it pastes the contents of stdio.h into the file.
The file includes various function declarations,
such that the compiler knows which functions you're referring to (as in the case of printf).
Believe it or not, the C preprocessor is actually quite good for my use case:
What!!! Isn't the C preprocessor ONLY supposed to be used for the C programming language?
Nothing's stopping you from invoking the C preprocessor directly with cpp.
Let's try to preprocess an HTML file.
$ cat hello_world.HTML #define TITLE hello, world! <!DOCTYPE html> <html> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1"> <title>TITLE</title> </head> <body> </body> </html> $ cpp hello_world.HTML -o hello_world.html $ cat hello_world.html # 0 "hello.HTML" # 0 "<built-in>" # 0 "<command-line>" # 1 "/usr/include/stdc-predef.h" 1 3 4 # 0 "<command-line>" 2 # 1 "hello.HTML" <!DOCTYPE html> <html> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1"> <title>hello, world!</title> </head> <body> </body> </html>
⁉️
It worked, but... Why is there random junk at the start?
So, the preprocessor is actually invoked from inside of the compiler most of the time, so it outputs a stream of binary tokens, and directly tells the compiler other important information (such as which macros were expanded, and from where).
However, it was made possible to run the preprocessor as a standalone program. Now there's a problem: how will the preprocessor communicate the important information to the compiler?
These strange lines in the beginning are called linemarkers, and they were generated because the preprocessor thought we were going to run a C compiler afterwards.
After a quick look at the manual, we can find the flag to disable these:
-P Inhibit generation of linemarkers in the output from the preprocessor. This might be useful when
running the preprocessor on something that is not C code, and will be sent to a program which
might be confused by the linemarkers.
This is our command now:
$ cpp -P hello_world.HTML -o hello_world.html
The preprocessor also discards C comments, so we also might want to disable that, since HTML has its own type of comments.
-C Do not discard comments. All comments are passed through to the output file, except for comments
in processed directives, which are deleted along with the directive.
[...]
-CC Do not discard comments, including during macro expansion. This is like -C, except that comments
contained within macros are also passed through to the output file where the macro is expanded.
$ cpp -P -CC hello_world.HTML -o hello_world.html $ cat hello_world.html /* Copyright (C) 1991-2025 Free Software Foundation, Inc. This file is part of the GNU C Library. The GNU C Library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. [... snip ....] /* wchar_t uses Unicode 10.0.0. Version 10.0 of the Unicode Standard is synchronized with ISO/IEC 10646:2017, fifth edition, plus the following additions from Amendment 1 to the fifth edition: - 56 emoji characters - 285 hentaigana - 3 additional Zanabazar Square characters */ <!DOCTYPE html> <html> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1"> <title>hello, world!</title> </head> <body> </body> </html>
Okay brochacho???
We can get a hint as to what is happening by looking at the generated linemarkers from one of our previous mistakes:
[...] # 1 "/usr/include/stdc-predef.h" 1 3 4 [...]
Ah. It is including this header, and pasting all the comments from it into our HTML file.
We can simply order it to not search the standard system directories (again, we're not preprocessing C source code, so we have no use for that).
-nostdinc
Do not search the standard system directories for header files. Only the directories explicitly
specified with -I, -iquote, -isystem, and/or -idirafter options (and the directory of the current
file, if appropriate) are searched.
Let's try it now:
$ cpp -P -CC -nostdinc hello_world.HTML -o hello_world.html $ cat hello_world.html <!DOCTYPE html> <html> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1"> <title>hello, world!</title> </head> <body> </body> </html>
Success!
Running cpp -P -CC -nostdinc <whatever>.HTML -o <whatever>.html for all your blog posts is kind
of tedious.
If only there was a tool designed for """make"""ing stuff using commands by writing your steps in a """file"""...
So the first solution I considered was a simple bash script:
$ cat build.sh
#!/usr/bin/bash
for file in *.HTML; do
cpp -P -CC "$file" -o "${file%.HTML}.html"
done
This will work wonderfully out of the box, but it regenerates all posts regardless of whether they have been changed or not.
Now, this is not an issue whatsoever in the 21st century, as it will complete instantly (even if you have hundreds of blog posts).
On the other hand you have Makefiles, which are slightly more complex to write, and also solve that specific nonexistent issue.
Okay, actually Makefiles aren't really that complex to write - they're quite straightforward:
target: source1 source2
command -o target source1 source2
# OR, using the funny special variables which make literally zero sense:
target: source1 source2
command -o $@ $^
Let's write one for our blog post generator thing:
.PHONY: all
all: hello.html
%.html: %.HTML
cpp -P -CC -nostdinc -o $@ $<
Let's also add a recipe for cleaning the generated HTML files:
.PHONY: all clean
clean:
rm -f *.html
$ make cpp -P -CC -nostdinc -o hello.html hello.HTML $ ls hello.html hello.HTML Makefile
Great! We're done, and now we don't have to touch the Makefile ever again.
Now that we have a build system, we can start making components to include in our posts.
Let's make a simple component to test this:
$ cat component.inc
my wonderful component
Now, with the power of the C preprocessor, we can:
[...] <body> #include "component.inc" </body> [...]
$ make cpp -P -CC -nostdinc -o hello.html hello.HTML $ cat hello.html <!DOCTYPE html> <html> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1"> <title>hello, world!</title> </head> <body> my wonderful component </body> </html>
And it works!
Cool. Let's change the text to something different.
$ cat component.inc my bizzare component $ make make: Nothing to be done for 'all'.
Huh? Oh. Time to change the Makefile again.
make which components our blog depends onAt first it seems like an impossible task, but then you realize that this problem is actually quite common in C as well:
#include "dinosaur.h"
int main() {
draw_a_dinosaur();
return 0;
}
How do we remake this source file when dinosaur.h changes?
Fortunately the wise orb-pondering GNU people have thought of this,
and added the -M flag to the C compiler (and the C preprocessor):
-M Instead of outputting the result of preprocessing, output a rule suitable for make describing the
dependencies of the main source file. The preprocessor outputs one make rule containing the
object file name for that source file, a colon, and the names of all the included files,
including those coming from -include or -imacros command-line options.
Okay, but how does this actually help us? Let's run cpp with this flag and take a look at the output:
$ # -MT = Rule target, -MF = Rule output file, -nostdinc or it will depend on a system include file. $ cpp -M -MT hello.html -MF hello.d -nostdinc hello.HTML $ ls component.inc hello.d hello.html hello.HTML Makefile $ cat hello.d hello.html: hello.HTML component.inc
As the manual page suggests,
it generated another Makefile called hello.d with a rule that describes the dependencies of
hello.HTML.
We can use this by including this tiny Makefile in the main Makefile, which will update the dependencies of the file.
$ cat Makefile .PHONY: all clean all: hello.html # A rule to generate *.d files from *.HTML files. %.d: %.HTML # Substitute the .HTML extension # for a .html extension. cpp -M -o $(<:.HTML=.html) $< # We leave this recipe as it was before, except we add %.d to the dependencies to force make to generate the .d file for it. %.html: %.HTML %.d cpp -P -CC -nostdinc -o $@ $< include hello.d $ cat hello.d hello.html: hello.HTML component.inc
Now,
hello.html will be remade when either hello.HTML or component.inc change,
and hello.d will be remade when hello.HTML changes so we can update the dependencies!
Awesome!!!
Let's refine the Makefile a little. First, let's introduce a sources variable:
.PHONY: all clean sources = hello.html all: $(sources) # [...]
Now, we can include ALL generated .d files (instead of including them one by one),
by using funny Makefile syntax:
# [...] include $(sources:.html=.d)
Now that we have set up the """templating engine""", we can ACTUALLY start writing posts.
Here's an example of a blog post might look like:
$ cat my_blog.HTML
#define BLOG_TITLE hello, world!
#include "blog_post.inc"
BLOG_POST_PREAMBLE
<h1>BLOG_TITLE</h1>
Blah blah blah blah.
<hr>
This post was compiled on __DATE__ __TIME__
BLOG_POST_EPILOGUE
I have set up the BLOG_POST_PREAMBLE and BLOG_POST_EPILOGUE macros,
which use the BLOG_* macros defined previously.
We can also use the built-in macros,
such as __FILE__, __DATE__, __TIME__, etc.
We are pushing the limits of the C preprocessor REALLY hard now. And it's starting to show.
I usually set the titles of my webpages to something like hsp // PAGE_TITLE (for style points),
however we run into a peculiar limitation when we try to do that from within a macro:
$ cat my_blog.HTML #define BLOG_TITLE my awesome blog post #include "blog.inc" BLOG_PREAMBLE $ cat blog.inc #define BLOG_PREAMBLE \ <title>hsp // BLOG_TITLE</title>
<title>hsp /* BLOG_TITLE</title>*/
Right. This happened because we didn't read the fine print on the `-CC` option:
-CC Do not discard comments, including during macro expansion. This is like -C, except that comments
contained within macros are also passed through to the output file where the macro is expanded.
In addition to the side effects of the -C option, the -CC option causes all C++-style comments
inside a macro to be converted to C-style comments. This is to prevent later use of that macro
from inadvertently commenting out the remainder of the source line.
So as far as I know, there is no solution to this, other than just "don't use double slashes in blogs."
Remember those __DATE__, __TIME__ and __FILE__ macros we talked about?
Let's expand one of them:
The current date is __DATE__
The current date is "Aug 25 2025"
Unfortunately, the date is quoted.
Oh, surely the C preprocessor has a way of unquoting strings!
The C preprocessor has NO way of unquoting strings! Fuck!!! Maybe using a tool made specifically for the C programming language as a templating engine for blog posts wasn't such a good idea!!!
You can probably get away with using the C preprocessor for templating for a little while, but you'll eventually run into one of the above limitations.
I ended up using a more generic macro processor for my blog posts: GNU m4.
It works just like the C preprocessor,
except instead of being designed for C, it is designed for any kind of text.
Even if it didn't work out in the end, I hope you enjoyed reading this descent into madness, and learned something new.
This post was compiled on Wed, 27 Aug 2025 03:00:10 +0300
.
Go check out my other blog posts, or my friends' pages!