Forgive me, for I have sinned. I wrote a bad type.

It wasn't just bad. It was the kind of type that knows its mom's birthday only from the digits of her credit card PIN number. It was sneaky, conceited, insubordinate and churlish. It threw up all over my codebase like a drunken guest and, despite my efforts to clean up, a bit of the stench remains.

Mind you, it's not the first bad type I write, and it's certainly not going to be the last. Its badness did, however, send me on a bit of a personal journey both to understand what exactly made it bad, and to figure out a strategy to prevent its kind from haunting me again.

I won't show you the type, not only out of shame——though there's a bit of that——but mostly because it's pretty large and full of extraneous detail that would just bog us down. I also think there's little value in me justifying the bad decisions that shaped it, and we might even end up feeling a bit sorry for it. No mercy! Onward!

Instead, what we'll do is craft some types similar to it, which showcase particular aspects of its awfulness. I'll then walk you through an approach I have to identify and rework types like it ever since, in the hope it will be useful to you too.

Be warned: I'm likely going to reinvent anywhere between the wheel and the entire car in this post. I'm grazing the huge, scary world of type theory, of which I'm definitely not an expert. As mentioned elsewhere I'm just a bit banger that likes to punch above their weight, so please take what I say with a grain of salt and if you know established terms for these concepts, please leave a comment!

Let's write some bad code🔗

Alright, buckle up.

// Please allow me the sweet embrace of death
pub struct Shape {
   pub is_circle: bool,
   pub is_square: bool,
   pub radius: i32,
   pub side: i32,
}

Okay, what I wrote wasn't quite as bad. I want some bedrock of horribleness so we can have a laugh and satisfy our primal urges——yes, I too have seen types this bad in the wild——and mostly to start developing a bit of a shared vocabulary.

So what makes this type so stinky? There are enough reasons that I'm tempted to give you the headlines in bullet point form:

  • It's wasteful. Either the radius or the side, at any give time, are unused bits.
  • It has a lot of obviously invalid states. What if is_circle and is_square are both true? And both false?
  • Signedness doesn't make sense for its dimension variables. What's a circle of -1 radius?
  • Its fields are public, which allows careless users to create invalid objects. It exposes its entrails to the sun like a fleeing sea cucumber; it won't be long until they attract flies and worms.

Anyone with Rust experience already has a solution for this problem in mind. Yes, I can see your platonic enum Shape { Circle(u32), Square(u32) } from here, dear hypothetical reader, slowly rotating under a beam of heavenly light. Perhaps you've put some personal touches like named fields for the variants, or you've gone for a Shape interface. Let's not get ahead of ourselves though, you've come here to see bad code and I'm not letting you go until your eyes bleed.

Invincible, invigorating invariants🔗

Let me play devil's advocate for a minute.

How can we make the type above not suck with the least amount of work? Well, There is a bare minimum we have to do to make it sound. If the type isn't sound——i.e. it doesn't do what it says on the tin——there's little point talking about high-minded things like good design.

The key operating word is invariants. I'm going to boldly assume we've all seen circles and squares before, so we all share an intuitive understanding of the invariants that our Shape has to hold:

  • A shape must be a square or a circle.
  • A shape can't be both a square and a circle at the same time.
  • A shape must have a strictly positive associated dimension (respectively side and radius).

As u/IshKebab rightfully pointed out in Reddit, zero-radius circles are mathematically valid! Apologies to any of my readers that also happen to be circles; I did not mean to discriminate. That said, invariants are going to depend on the specific application, so for the purposes of this post we'll pretend zero-radius circles are not valid.

It is possible, though not ideal, to encode all of these invariants in the API. In fact, as long as we make sure to encapsulate the type properly, encoding the invariants in the API is all that's needed to make the type sound, and for all intents and purposes immune to misuse:

// I'm still horrible, but at least my awfulness is hidden!
pub struct Shape {
   is_circle: bool,
   is_square: bool,
   radius: i32,
   side: i32,
}

impl Shape {
    pub fn circle(radius: i32) -> Result<Self, &'static str> {
        if radius <= 0 {
            Err("Must have positive radius")
        } else {
            Ok(Self {
                is_circle: true,
                is_square: false,
                radius,
                side: 0,
            })
        }
    }

    pub fn square(side: i32) -> Result<Self, &'static str> {
        if side <= 0 {
            Err("Must have positive side")
        } else {
            Ok(Self {
                is_circle: false,
                is_square: true,
                side,
                radius: 0,
            })
        }
    }

    pub fn is_circle(&self) -> bool { self.is_circle }
    pub fn is_square(&self) -> bool { self.is_square }
    pub fn radius(&self) -> Result<i32, &'static str> {
        if self.is_circle {
            Ok(self.radius)
        } else {
            Err("Only circles have a radius")
        }
    }
    pub fn side(&self) -> Result<i32, &'static str> {
        if self.is_square {
            Ok(self.side)
        } else {
            Err("Only squares have sides")
        }
    }
}

The above is a perfectly safe, completely sound type. Thanks to the strength of Rust's type system, that rotting corpse of an abstraction is allowed to spend its miserable existence unnoticed, tightly sealed behind limited construction and access rules.

Still, even if we leave compactness aside (which is an easy fix anyway, by just using a common dimension field that doubles as radius and side), it's clearly a very poor type. Its bad design radiates outward polluting everything it touches, and has forced us to write an awkward, clunky API to allow the external world to interface with it.

The reason why this example is interesting to me is that invariant driven development, as I've seen it called, is not enough to prevent this kind of design mishap. It does help tremendously to identify potential bugs and build safe abstractions at the API level, but it doesn't necessarily help drive the design of a module.

In a limited example like this the better path is obvious, but as is often the case with restricted examples, the entire point is to put the spotlight on an issue that may appear in subtle, insidious forms, so it can be fought effectively when encountered. A mental vaccine, if you will, to use a contemporary hot analogy.

Quantifying badness🔗

Let's have a bit of a palate cleanser:

// Boy, I sure feel bad for that *other* Shape
pub enum Shape { Square(Square), Circle(Circle) }
pub struct Circle(u32);
pub struct Square(u32);

impl Circle {
    pub fn from_radius(r: u32) -> Self { Self(r) }
    pub fn radius(&self) -> u32 { self.0 }
}

impl Square {
    pub fn from_side(s: u32) -> Self { Self(s) }
    pub fn side(&self) -> u32 { self.0 }
}

// Convenience fallible API
impl Shape {
    pub fn radius(&self) -> Result<u32, &'static str> {
        match self {
            Shape::Square(_) => Err("Only circles have a radius"),
            Shape::Circle(c) => Ok(c.radius()),
        }
    }
    pub fn side(&self) -> Result<u32, &'static str> {
        match self {
            Shape::Square(s) => Ok(s.side()),
            Shape::Circle(_) => Err("Only squares have sides"),
        }
    }
    pub fn is_circle(&self) -> bool { matches!(self, Shape::Circle(_)) }
    pub fn is_square(&self) -> bool { matches!(self, Shape::Square(_)) }
}

This shape seems a lot better than the one before. Not ideal, but better. It feels habitable, methods are short and to the point. However, to play devil's advocate again, those benefits are vague and subjective. Believe it or not, this one is less safe than the one before. Can you spot why? That's right, in the pursuit of elegance we've dropped the invariant that shapes must have a strictly positive dimension. Degenerate squares and circles——with side and radius of 0, respectively——are now possible. Whoops!

Wouldn't it be cool, then, if we could come up with a quantifiable, objective metric of badness, that we could dispassionately apply to both the types above?

Enter tightness.

The tightness of a type is the proportion of invariants that are upheld in the type definition as opposed to its methods.

NOTE: I've asked the few type-theory people I know that still talk to me, looking for a preexisting term to define this. I don't think such a word exists, so I figure I might as well coin one. If it turns out there's one though I'm happy to defer to it for clarity.

There is a way to indirectly measure tightness and give it a very concrete number: Simply work out the proportion of representable states of your type that are valid. Let's apply it to our shapes.

// Why do you keep dragging me into this? I just want to live a quiet life...
pub struct Shape {
   is_circle: bool,
   is_square: bool,
   radius: i32,
   side: i32,
}

Let's look at the fields in turn. We have two booleans for a total of four possible permutations, of which only two are valid (is_circle XOR is_square). This cuts our representable space in half, so we are already down to a tightness of 50%.

We then get to radius and side. Only one of them is meaningful at any given time, so for the purposes of this informal heuristic we can simply ignore the other. This leaves us with 2^32 states for the dimension, of which half (give or take the zero) are valid. Multiplying the two together, we get a total tightness of around 25%. Only one fourth of the values that our type can hold correspond to meaningful circles and squares.

How does our second candidate fare?

// Ha, I'm sure I'm going to get a perfect score
pub enum Shape { Square(Square), Circle(Circle) }
pub struct Circle(u32);
pub struct Square(u32);

It certainly seems promising. It's a sum type, and we know by definition that both sides (squares and circles) are valid. Then each branch has 2^32 representable states, of which everything but 0 is valid. That leaves us with a tightness of 100 * (1 - 1 / 2^32), which is... 99.99999997671694%.

// Hey, what do you mean 99, you jerk?
pub enum Shape { Square(Square), Circle(Circle) }

Uhh... Let me double check my math, but I'm pretty sure it's correct. You're nearly perfectly tight, but not quite.

// Double check your $%$#@#. I'm pure elegance and perfection!
pub enum Shape { Square(Square), Circle(Circle) }

Why am I talking to a type? Anyway, the problem is that you can represent exactly one invalid circle, and one invalid square, see? You can be Shape::Square(Square(0)) or Shape::Circle(Circle(0))

// And how is that my problem? It's not my fault that some dumb coder
// forgot to make my constructor safe. Can't you just assert that `radius > 0`
// and `side > 0`? Even I can think of that, and I'm four bytes long. You should
// be ashamed.
pub enum Shape { Square(Square), Circle(Circle) }

But you're missing the point. If I did that, you'd indeed be safe to use, but you wouldn't be any tighter. My metric of tightness only cares about the representable states encoded in your definition, not your methods.

// That's idiotic. How is anything tight, then? A boolean is one byte long,
// which means 256 different variants. Only two are correct, so is every boolean
// less than 1% tight?
pub enum Shape { Square(Square), Circle(Circle) }

That's not what I mean. I'm not talking about bitwise representation literally. I'm only talking about well defined states for your fields, those that you can only reach by correctly interacting with their API.

// So what am I supposed to do, then? Come on, my methods
// must do *some* of the work, right? It's not my responsibility to be always
// correct. There isn't even a way to promise to be positive in my definition!
pub enum Shape { Square(Square), Circle(Circle) }

Actually, there is. It's even in the standard library. Come on, try this on for u-size! Har har har.

// Uh... I feel a bit queasy
pub enum Shape { Square(Square), Circle(Circle) }
pub struct Circle(NonZeroU32);
pub struct Square(NonZeroU32);

That's just the feeling of perfect tightness. NonZeroU32 promises to be nonzero, and since we only care about its valid states, we know that every one of its possible values corresponds to a valid shape. Welcome to the 100% club!

The two rules of tightness🔗

There are two rules that apply to this idea of tightness, that I'll informally define now and attempt to justify in the sections to follow.

  • Rule 1: Unless your type's sole responsibility is to restrict the values of its fields, it can always be 100% tight (in Rust).
  • Rule 2: All other factors being equal, tighter types are better.

And as a corollary:

  • You should strive to make your types 100% tight.

Giving it your 100%🔗

The first claim is already a bit dubious. Is it really always possible? What about limited integer ranges? What about types with a gigantic representable space like strings, hash maps, linked lists? With the current limitations of const generics, there's no way to guarantee at compile time that their values will be in the appropriate ranges.

It's a valid question, and to answer that I have to highlight that tightness has nothing to do with compile time decisions. Sure, pulling safety checks back into compile time is great——if you've read more of my blog you'll know I'm a huge fan of typestate programming——and you should certainly strive to do as much of the heavy lifting as you can before the program actually runs. However, tightness is more about defining responsibilities. Let's look at an example:

// I despair at my looseness.
struct Account {
   // Must be alphabetical and less than 8 characters long.
   username: String,
   // Must be alphanumeric and between 8 and 16 characters.
   password: String,
}

That's... Just not tight. In fact, it's nearly infinitely loose, as there are an uncountable amount of character combinations that don't comply with the restrictions expressed in the comments. We could, of course, carefully write the API so that the invariants are maintained at every point, but that would do nothing to increase this particular type's tightness. Let's then be infinitely lazy and just delegate it to a second type that will solve all of our problems.

// Ahh! Such a weight off my shoulders.
struct Account {
   username: Username,
   password: Password,
}

struct Password(String);
struct Username(String);

impl Password {
    fn new(raw: &str) -> Option<Self> {
        (raw.len() > 8 && raw.len() < 16 && raw.chars().all(char::is_alphanumeric))
            .then_some(Self(raw.to_owned()))
    }
    fn get(&self) -> &str { &self.0 }
}

impl Username {
    fn new(raw: &str) -> Option<Self> {
        (raw.len() < 8 && raw.chars().all(char::is_alphabetic))
            .then_some(Self(raw.to_owned()))
    }
    fn get(&self) -> &str { &self.0 }
}

Tada! Our Account type is now 100% tight. It doesn't matter that Username and Password enforce the invariants at runtime. What matters is that, given they're soundly implemented and properly defined, they are completely responsible for maintaining the invariants that Account requires. None of their representable states make for an invalid Account, and thus Account is tighter than submarine rivets.

Unfortunately, it's also grown a bit of boilerplate, and these new types aren't really all that comfortable to use. Wouldn't it be cool if a skilled, humble and very handsome coder had come up with a library to make this problem a lot simpler?

Ahem...

... Yeah, unfortunately it was written by me instead, but we'll have to do. I wrote the tightness crate as a companion to this article, to provide a one-size-fits-all solution to encoding invariants as part of types. It's naturally less expressive and powerful than specialized types like NonZeroUSize, but serves as a good fallback for the general case.

This crate also provides an answer to the question "Can all my types be completely tight?" I mentioned that the only exception holding us back from a resounding "yes" is the case where our type's responsibility is to restrict the values of its fields. Thankfully, that's exactly the case this library takes care of in a general way, so it ensures everything can be taken to 100% tightness.

Using this library, we can rework our Account type to be a lot shorter:

struct Account {
   username: Username,
   password: Password,
}

// NOTE: For maximum safety, you'd probably want these to be on a separate
// module so their internals aren't reachable by the methods in `Account`.

bound!(Username: String where |s| {
   s.len() < 8 && s.chars().all(char::is_alphabetic)
});

bound!(Password: String where |s| {
   s.len() > 8 && s.len() < 16 && s.chars().all(char::is_alphanumeric)
});

By definition, the Username and Password types always fulfill these conditions, which means the Account type is 100% tight. You could argue that this is all pointless, and that all we've done is move invariant checks from the methods of Account into the methods of these new types instead. And while you're absolutely correct, imaginary strawman, I'll argue that it brings many benefits of varying degrees of subtlety, which I'll try to defend in the next section.

The takeaway of this post should not be that you should use my library, this is a technical piece, not an ad. Still, if you want to experiment and write some code following this approach, I've written it to make it as simple as possible to apply the concept of 100% tight types. Let me know any thoughts or pain points you have when messing with it!

Reaping the benefits of perfect tightness🔗

So what does all of this nonsense afford us? Why go through the trouble of thinking of type tightness first, and developing the API later? Here are the benefits I've found when applying this approach:

1 - Pushes you towards the right abstractions🔗

When you define a structured type with multiple interrelated fields, held together by a bunch of logic, it's generally a sign that there are missing abstractions. Take this silly example:

struct Fleet {
   number_of_cars: u32,
   // Must be less than 4*cars
   total_wheels: u32,
   monthly_revenue: u32,
}

Using the tightness crate, the only way to enforce invariants across two fields would be to define a type that wraps them. The process of solving that problem would naturally lead to the missing Car abstraction, with an invariant over its number of wheels.

2 - More expressiveness from safely exposed internals🔗

Let's go back to our Account example, with a twist:

// Woah, why is everyone looking at me?
pub struct Account {
    pub username: Username,
    pub password: Password,
}

You'll notice we've made both fields public. If all invariants are held by the type definition and not its methods, this means it's always safe to expose the internals to the users. This has a bunch of practical benefits, such as allowing them to use expressive tools like destructuring, pattern matching and split borrows. The alternative of offering getters and setters for fields can be rigid and limiting in comparison.

The flexibility extends to giving the user tools to store, process and manipulate the invariant holders in whichever way they see fit. Rather than go to Account's module documentation and methods to ensure they're building a String the right way, their attention can be limited to the single point of Username and Password definition.

This doesn't mean that everything should be public all the time, though wanting a field of a 100% tight type be private can be a smell test. If the field is truly irrelevant to the user, does it have to be there?

3 - APIs are cleaner, with fewer error returns🔗

Typically, invariants will be reflected in the API in one way or another. At worst, methods will panic when fed invalid input. At best, there will be some Result returns with expressive Error types. Why not just do away with all that?

// Old `Account`
impl Account {
   pub fn set_password(&mut self, password: String) -> PasswordError { /*...*/ }
   pub fn set_username(&mut self, username: String) -> UsernameError { /*...*/ }
}

enum PasswordError {
    PasswordTooShort,
    PasswordTooLong,
    PasswordNotAlphanumeric,
}

enum UsernameError {
    UsernameTooLong,
    UsernameNotAlphabetical
}

Uh, so many ways to go wrong. By tightening Account, we can outright remove not only the error variants, but the functions themselves. Even if we keep the fields private for whatever reason, they'd become significantly simpler:

// New `Account`. Look Ma, no errors!
impl Account {
   // It's probaby best to not have these methods at all, but still
   pub fn set_password(&mut self, password: Password) { /*...*/ }
   pub fn set_username(&mut self, username: Username) { /*...*/ }
}

4 - No regressions when adding functionality🔗

When working with a "loose" type, it doesn't matter if the current behaviour perfectly respects invariants; We have to remain ever vigilant that new methods don't invalidate all our previous work. Let's say that in a day of drunken coding we decide to add a new method to make our passwords UBER SECURE.

// Something feels off...
struct Account {
   username: String,
   password: String,
}

impl Account {
   /// Rotates each character a number of positions in the ascii table
   pub fn rotate_password(&mut self, amount: u8) {
      self.password = self.password.chars().map(|c| (c as u8 + amount) as char).collect();
   }
}

Spot the problem? Yeah, we've blown a huge hole in the invariants of our struct. Even if our setters and getters enforce the Password rules, we've now made it possible to create a password that isn't alphanumeric. With our tightness based type, this wouldn't have been possible:

struct Account {
   username: Username,
   password: Password,
}

impl Account {
   /// Rotates each character a number of positions in the ascii table
   pub fn rotate_password(&mut self, amount: u8) {
       // This will panic!!!
       self.password.mutate(|p| *p = (*p).chars().map(|c| (c as u8 + amount) as char).collect());
   }
}

Delegating the invariants to our type definition saves the day: our rotate_password method is caught red handed (in this case with a panic, but tightness also offers less explosive ways to mutate your bounded types).

In a similar way to how unsafe helps us focus our attention on the critical bits during code review, making our types tight means we only have to worry about invariants when working on the type definition, and not so much during methods.

Conclusion🔗

Again, I don't claim to be treading new ground here. It's likely that the ideas behind this approach have been expressed in other terms many times before, and I wouldn't be surprised if the exact concept I'm talking about is old news to people who really know what they're doing with types.

My attempt with this post is to give you a set of simple tools and heuristics to experiment with that have helped me design better types and, as a result, safer and more elegant code.

If you've enjoyed the read and think there's some merit to it, I challenge you to a game: Write your next small project with a tightness-first approach. Before writing a single impl block, ensure your type is incapable by construction of holding invalid data. You're welcome to use my tightness crate or create your own types. Then when you're done, please let me know how it went via email, reddit comment or discord DM (Corax#0426). Even if you hate my guts at the end of it, I'll be glad to hear it!

As always, thanks for sticking around. Happy rusting!