After many years of development, the Vale compiler just hit its 100,000th line of code.
This is an article about how we kept it from collapsing under its own weight and exploding, as many projects do.
Some of these software engineering techniques came from my time at Google, though ironically most came from my work on the Vale compiler and game development 0 so some of these might be surprising to my engineer comrades out there.
These techniques range from determinism, to testing, to type-system techniques, to general architectural best-practices.
And since I'm a language geek, I'll throw in some nonsense about languages!
I'm not exaggerating when I say assertions are the greatest thing since sliced arrays, and have cut my debugging time by half.
Rule of thumb: Every time you think something is true about your data, and it's not ensured by the type system, add an assertion that checks it.
Don't just occasionally use them, drench your code in them. When your code so much as sneezes, they should shake off like sand after the beach. This is the way.
We once had a bug in the post-parser which caused some data corruption, and thank goodness it was caught by an assertion in the final stage of the compiler. 1 If that assertion didn't catch it, it would have made it into the LLVM-generated code, and would have taken hours to track down.
Such miracles are why the Vale compiler has an entire 1,795 assertions.
Though I did later apply a lot of these architecture techniques to Google Earth.
Previously known as the Hammer stage, though now it has the more boring name of FinalAstSimplifier.
Even if you're using a statically typed language, you can still make youre code more statically typed.
For example, a string can serve many purposes: an ID, a first name, a URL, and so on. Sometimes, we can mix up which strings are fulfilling which purposes. When it gets real bad, we sometimes call our code "stringly typed".
When you find yourself accidentally passing an ID string argument into a first name string parameter, it's a hint that you shouldn't be passing around strings, and perhaps you should wrap them in different kinds of structs. This is sometimes called the New Type pattern.
And when you find yourself using strings to represent multiple pieces of data (perhaps split by a special character like :), break it apart into a struct with multiple fields instead.
Any statically-typed language has this ability, make sure to lean on it!
Languages like Vale, Austral, and Rust are particularly strong in this regard because of their type-state programming 2 and especially Vale's Higher RAII. 3
In 2019, I came across some of our code for generational references. 4
"That's odd," I said, "This really should be simpler," and I started refactoring.
However, I quickly ran into a wall, and discovered why our code looked odd: it was actually handling some pretty complicated requirements that required that odd approach.
In 2021, I came across the same code.
"That's odd," I said, "This really should be simpler," and I started refactoring.
Then I proceeded to run into that same wall. It was only then that I remembered my first attempt.
This is why we leave comments!
However, comments can become out-of-date, and you might not be lucky enough to stumble across the right comment before you embark on a refactoring adventure.
For this reason, we also scatter clues around, like the below see SAIRFU.
"See (acronym)" means to search our internal docs for more explanations and discussion. In this case, the documentation's SAIRFU section says:
This is good for many reasons:
TL;DR: Have many links to centralized documentation.
Have you ever had a bug that you could only reproduce one out of every ten tries? What an adventure!
And then there's the bugs that you might only reproduce every hundredth try. A harrowing endeavor. You try and you try, and eventually give up and close the bug report as "Cannot Reproduce".
This unpredictability is caused by non-determinism, in other words, when there are some random factors that make it hard to predict whether something will happen. For example:
The biggest case of this in the Vale compiler was when Scala's Arrays were hashing nondeterministically. I wrote about it more in Hash Codes, Non-Determinism, and Other Eldritch Horrors, so if you're into horror stories, check it out.
If non-determinism creeps into your program, you'll be testing your application, find a bug, and then never be able to reproduce it. Avoid it when you can!
Type-state programming is a way for the compiler to ensure that a particular class doesn't fall into an invalid state.
Higher RAII will make sure you never forget to call a particular function.
Generational references are an alternative mechanism for memory safety, like reference counting, garbage collection, or borrow checking.
I was shocked when C#'s string.GetHashCode broke my determinism in the 2020 7DRL Challenge.
Even in a compiler, which needs no network requests or animations, 6 we still have plenty of nondeterminism from asynchronous file IO and thread scheduling.
So how might a language let us reproduce bugs, even in the presence of nondeterminism?
The answer is something I like to call perfect replayability. We've prototyped it in Vale, and it's working pretty well!
In perfect replayability, the compiler will instrument 7 the code to:
While you're developing or testing, your program records to these files. When you find a bug, fire up replay mode, and enjoy the time you saved!
In the 2020 7DRL Challenge, on the penultimate evening, I launched thousands of games automatically played by a "random AI" player. Three of them crashed, and I was able to fix the bugs because I could reproduce those crashes with deterministic replayability.
Hold my beer. A compiler... with animations. Best error messages ever! Who's with me?
Instrumentation is where the compiler emits additional instructions to serve a particular side-goal, such as debuggability, observability, or replayability like we see here.
In a small project, only a few thousand lines, it's easy to fix a bug without causing any more bugs because you know the system in and out.
Once you start approaching 10,000 lines, fixing one bug will often cause multiple other bugs. Not just easy bugs, but obscure bugs that your users find six months later.
However, with tests, you'll know instantly whether your fix caused any other bugs. You can then try a better fix.
If you don't have a vast suite of tests, your project might get slower and slower until changing anything feels like pulling teeth. 8
Some languages are easier to test with. Javascript's Monkey Patching is a wonderful alternative to mocking, and can make testing much easier. 9
In the early days of the Vale compiler, we had a lot of unit tests for our various components.
For those unfamiliar, a unit test is one that specifically tests just one piece of code. You craft some inputs, feed those into your code, and check the outputs.
Unit tests are nice because they tell you exactly where the bug is, because they test only a small piece of your code.
However, the data being passed between these components was changing very often, because the project was evolving rapidly in response to user feedback and experiments.
And unfortunately, every time this happened, we had to change the unit tests. Quite irksome!
Instead, we've switched over to end-to-end tests. An end-to-end test is where a script will open up your application and click on buttons and type inputs in the right sequence to indirectly run some specific code in your program.
For the Vale compiler, it means we run the compiler with some Vale source code, then run it, and make sure it produces the right output. As of this writing, the Vale compiler has 1,308 end-to-end tests.
Some caveats on this advice:
It's good practice to add a test whenever you stumble upon a bug.
However, let's take that advice one step further.
Whenever you have a test that discovers a bug, ask yourself, "could a more specific test have caught this too?" and then add that more specific test.
This approach has a hidden benefit. If you're refactoring a nearby area of your code and you break this functionality, you now have a much more specific test failure to tell you what exactly is going wrong.
If you're not careful, your development speed can slow to a crawl. Here are some ways to keep yourself nimble:
Use a language with good compile speeds. If you go doomscrolling on reddit while you're building, you know your compile times are too long.
Use a memory-safe language. Memory safety doesn't just help with security, it helps you avoid bugs that are very difficult to diagnose.
Prioritize looser coupling. If you have to change code way over there to accommodate a feature way over here, take a step back.
Find a way to harness object-oriented benefits:
...without incurring object-oriented drawbacks (implementation inheritance's brittleness).
Use a flexible language. The best languages let you focus on the problem you're trying to solve, rather than the constraints of the language which don't make much sense for your use cases.
For example, if you're making a turn-based roguelike game, C# or Typescript could be better choices than Haskell or Rust, whose extra constraints might cause extra refactoring and complexity. 10
Statically-typed, garbage collected languages like Java 11 can sometimes be the best in this regard. They may not be flashy but they're flexible, much more multi-paradigm, and have good compile speeds.
Bonus points to languages like Scala that let you temporarily "turn off" the type system via Nothing, so you can work on your feature now and fix the types for unrelated code afterward.
Let's take assertions to the next level!
Let's say you have a id_to_account HashMap<ID, UserAccount>. Unfortunately, to find a user by name, you have to loop through the entire map, because it's keyed by ID, not by name.
So then you add a separate name_to_account HashMap<str, UserAccount>, and you try to keep these two maps in sync. However, if you accidentally remove an account from only one, you now have a data inconsistency.
After you add your normal assertions, also consider periodically calling a sanityCheck function:
In one case, we have an 80-line sanity check function to check that all the state in the generics solver is consistent.
When I worked on Earth, I made a 200-line sanityCheck function run before and after every click, which made sure the application was in a sane state. It saved countless hours of debugging.
Lean hard on this technique, it will serve you well.
Thanks for visiting, and I wish you the best of luck in your first 100,000 lines!
In the coming weeks, I'll be posting the next article in the Implementing a New Memory Safety Approach series, so subscribe to our RSS feed, twitter, or the r/Vale subreddit, and come hang out in the Vale discord.
If you found this interesting or entertaining, please consider sponsoring me:
With your help, I can write this kind of nonsense more often!
There are exceptions to this. In game development, requirements change so much, that even end-to-end tests can have a negative return-on-investment. Other methods of testing are preferred, such as setting two random AI players against each other. Be sure to save the random seed!
Part of me wonders if we can get a statically typed language to do the same thing... Java's newProxyInstance is particularly interesting here.
This often depends on the domain, I've found. CLI apps, stateless servers, and smaller projects can much more easily work with functional programming and borrow checking. Games, apps, or stateful programs clash with these paradigms more.
And this is coming from me, who generally doesn't prefer garbage collection!
Vale aims to bring a new way of programming into the world that offers speed, safety, and ease of use.
The world needs something like this! Currently, most programming language work is in:
These are useful, but there is a vast field of possibilities in between, waiting to be explored!
Our aim is to explore that space, discover what it has to offer, and make speed and safety easier than ever before.
In this quest, we've discovered and implemented a lot of new techniques:
These techniques have also opened up some new emergent possibilities, which we hope to implement:
We also gain a lot of inspiration from other languages, and are finding new ways to combine their techniques:
...plus a lot more interesting ideas to explore!
The Vale programming language is a novel combination of ideas from the research world and original innovations. Our goal is to publish our techniques, even the ones that couldn't fit in Vale, so that the world as a whole can benefit from our work here, not just those who use Vale.
Our medium-term goals:
We aim to publish articles biweekly on all of these topics, and create and inspire the next generation of fast, safe, and easy programming languages.
If you want to support our work, please consider sponsoring us on GitHub!
With enough sponsorship, we can: