[2025-08-11 Mon]
← Back to rootThis post is about programming. In particular, I use a laborious, esoteric language (esolang) to generate random strings of a symbolic system given in Douglas Hofstadter's excellent book Gödel, Escher, Bach.
My friends and I are reading Gödel, Escher, Bach. In chapter 1, "The MU-puzzle", Hofstadter presents a formal system of symbols; effectively, a set of all valid strings created by the rules below. x and y are placeholders for strings.
The following are all valid strings:
Hofstadter invites the reader to ponder whether MU is valid. This post is not about that question.
Thue is an esoteric programming language named after logician Axel Thue.
Links:
For now, we'll ignore Thue's ability to do input and output. That omission leaves us with a programming system that simply takes in a string and outputs a string.
Thue is nondeterministic; that is, it acts randomly. In an instance of goofing off, I wondered whether I could write a program in Thue that generates a random valid string of the MIU system. (Thue is Turing complete, so it's necessarily possible to write such a program. I wondered whether I could do it.)
Let's look at some simple programs.
a::=Hello!
::=
a
The rule
a::=Hello!
specifies that any occurrence of "a" in the current string can be replaced by "Hello!".
The line containing ::=
alone is not a rule and ends the
list of rules.
Text that comes after the end of the rules is the initial string. Thue takes that string and applies a random applicable substitution. When no more substitutions apply, computation ends and the current string is returned.
Here's another program:
a::=Hello!
a::=Goodbye?
::=
a
This program will either return "Hello!" or "Goodbye?". It can't do
both, since once a substitution is made, there are no a
's left in the current string.
Another one:
?::=intergalactic
?::=planetary
::=
? ?
This will return one of four possible values:
Another dimension; another dimension; another dimension; another dimension; another dimension! Another dimension? Another dimension! Another dimension. Another dimension; another dimension; another dimension; another dimension.
Rules (1), (3), (4) translate pretty easily into Thue since they involve replacements of predetermined, finite strings.
So I wanted to build up to rule (2) first. Doubling an arbitrary string sounded hard to me, so I started by doubling a string I know. Our goal will be to turn MIII, which looks like a string of MIU, into MIIIIII. (That's six I's. We're doubling the part after M.)
_M::=@M+
+II::=.I+I
+I_::=.I@
.I::=II
::=
_MIII_
Note that _MIII_
has its beginning and
end marked with underscores. This is common in Thue, since substitution
rules don't allow matching against the beginning or end of the
string.
This program runs like this:
@
, and add a plus sign after the initial M
. This is the only possible action at the
beginning of the program..
's in its wake before each I
it encounters.I
's.Remember: Thue runs nondeterministically, so those aren't fixed phases. It's up to us, the programmers, to restrict the possible actions using the form of the string.
There are definitely simpler ways to do this, but this works! Nice.
Let's try it with two symbols:
_M::=@M+
+II::=.I+I
+IU::=.I+U
+UI::=.U+I
+UU::=.U+U
+I_::=.I@
+U_::=.U@
.I::=II
.U::=UU
::=
_MIUIIUU_
That's a lot of rules. Again, there are certainly more parsimonious ways to do this, but it works.
This is far from our eventual goal: Mx → Mxx
. When we feed this program
MIUIIUU, the actual rule (2) of MIU would yield
MIUIIUUIUIIUU. Instead, this program yields
MIIUUIIIIUUUU. We're doubling letters in place rather than
appending a new copy of the string.
I decided to handle this problem by duplicating letters in place, since we know how to do that, then moving them to the end of the string. We'll have to get the order right. Let's only worry about movement for now, not copying:
>I_::=_I_
>II::=I>I
>IU::=U>I
>U_::=_U_
>UI::=I>U
>UU::=U>U
::=
_M>I>UUI_
This program runs like this:
I
or U
comes after >
, move the I
or U
ahead of the next I
or U
. Retain the
>
prefix so it keeps moving.>
characters or underscores.This program yields _MUI_I_U_
. Some
unsightly underscores are left over, but the order of characters is
correct.
Now copy letters at the same time:
II_::=I%I#I_
IU_::=I%U#U_
UI_::=U%I#I_
UU_::=U%U#U_
II%I::=%I>I%I
II%U::=%I>I%U
IU%I::=%U>U%I
IU%U::=%U>U%U
UI%I::=%I>I%I
UI%U::=%I>I%U
UU%I::=%U>U%I
UU%U::=%U>U%U
_I%I::=%I>I%I
_I%U::=%I>I%U
_U%I::=%U>U%I
_U%U::=%U>U%U
>I_::=#I_
>I%I::=%I>I
>I#I::=#I#I
>I%U::=%U>I
>I#U::=#I#U
>U_::=#U_
>U%I::=%I>U
>U#I::=#U#I
>U%U::=%U>U
>U#U::=#U#U
::=
_IUUU_
_UIII_
Ouch. The number of rules exploded. But it's not so bad, really! Let's imbue those arcane glyphs with meaning.
#
: The following character cannot be
passed. (Neither can this #
.)%
: The following character has been
copied and must not be copied again.>
: The following character is on a
journey toward the end of the string.The rules are still a lot to take in, but know: this program works kind of like the previous two combined. Action moves from the end of the string to the beginning. Along the way, characters are copied; one is marked ready to move toward the end of the string, and the other is marked as copied. Then all the characters move and take stalwart seats at the end of the string.
And it works! This program outputs these two lines, which developed independently (since no substitution works across a line break):
%I%U%U#I#U#U_
%U%I%I#U#I#I_
Ignoring the %
and #
symbols, this is exactly what we want. With
that, it's time.
_::=
I_::=IU_
I_::=I2#<2
U_::=U2#<2
I2::=2%I>I
U2::=2%U>U
M2::=M>2
>2%I::=I>2
>2%U::=U>2
>2#I::=I>2
>2#U::=U>2
>2#<2::=_
>I%I::=%I>I
>I%U::=%U>I
>U%I::=%I>U
>U%U::=%U>U
>I#::=#I#
>U#::=#U#
I_::=(I*
U_::=(U*
(III::=(U
(UU::=(
I(::=(I
U(::=(U
M(::=M)
)I::=I)
)U::=U)
)*::=_
::=
MI_
There are 28 substitution rules in this program. I find it rather pretty. I will give only a light explanation of its functioning.
The initial string is MI_
. An
underscore marks its end.
Rule (1) of MIU is accomplished by I_::=IU_
.
Rule (2):
<2
. You could read this "2 is running". The
system can't apply any other rules while "2 is running".2
toward the beginning
of the string and back. It drives the copying and moving we've been
developing. On its way back to the end of the string, it removes all the
auxiliary characters used for the copying.<2
with a final underscore
again.Rules (3) and (4):
*
.
Like with (2), no other rule can apply while the string ends with *
.(
, an opening
parenthesis, toward the beginning of the string and back. On its way
there, it can choose to perform the substitutions of rules (3) and (4)
at any applicable time. Or, it can simply keep going.)
, a closing parenthesis, back
from the beginning of the string without it doing anything and replace
*
with a final underscore again.Because the underscore is restored by each rule, any conceivable
sequence of rules can run (with nebulous probabilities). It's possible,
between rules, for the final underscore to disappear (the rule _::=
reads "underscore becomes nothing"), in
which case nothing can be done and the computation ends. The string we
get results from applying some sequence of MIU rules to MI.
Therefore, the result is a random (according to some unknown
distribution) valid string of MIU.
Thank you (and book club!) for coming with me on this adventure. Here are 15 unique strings of MIU:
MIIIIIIIIU
MIIII
MUIUIU
MIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIU
MIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIUIU
MIIIIU
MIUIUIUIUIUIUIUIU
MIIUIIUIIUIIUIIUIIUIIUIIUIIUIIUIIUIIUIIUIIUIIUIIU
MIUIU
MIUIUIUIU
MI
MIU
MIIU
MIIUIIU
MII
I ran these programs using a Thue interpreter I wrote in Common Lisp. It's available on Codeberg. Using it will be a bit annoying because it depends on other Common Lisp packages of mine that aren't on Quicklisp.
The programs are available on Codeberg.