During last episode, we went on a tour of the pleasantly boring world of flash memory, drafted some driver interfaces, and toyed with the idea of generics at both levels of crate visibility. What's conspicuously absent is, well, working code. It's one thing to lay down data structures, and another entirely to weave generics between the raw ones and zeros of bare metal firmware. Today we'll gaze long into the void*, and, hopefully, the void* won't gaze back into us.

A thoroughly unnecessary warmup🔗

In the section about flash memory we explored, with the help of a dodgy analogy, how it's possible to directly overwrite a 1 with a 0 but not the reverse. This quirk is what makes writing efficient flash writers a bit tricky. When asked to write over a memory range, a naive driver would simply erase every sector overlapped by it, then write back the original data "merged" with the desired bytes. We want our driver a bit smarter than that; it should first check whether it's possible to write the bytes directly. In other words, we want to ensure that every 1 in our intended write is also a 1 in the target space.

pub trait BitSubset {
    /// Check that every '1' bit is a '1' on the right hand side.
    fn is_subset_of(&self, rhs: &Self) -> bool;
}

That looks like a straightforward, simple trait to express the condition we want to check. Here comes the unnecessary part: We're going to implement it generically. Let's be honest, we both know we want to implement this for [u8]. There shouldn't be much use for it other than slices of bytes. However, doing it this way will help flex our generic muscles and illustrate a couple of important points.

impl<T: Copy + Eq + BitOr<Output=T>> BitSubset for [T] {
    fn is_subset_of(&self, rhs: &Self) -> bool {
        if self.len() > rhs.len() {
            false
        } else {
            self.iter().zip(rhs.iter()).all(|(a, b)| (*a | *b) == *b)
        }
    }
}

Yes, I feel that dread too. There are enough sigils up there to summon half the Lesser Key of Solomon. We have five types of brackets (<(|[{) and we're barely getting started. Fortunately symbols are much like ant invasions—live with them long enough and you'll stop noticing. They can even be endearing.

When I started playing with Rust generics a few years ago, it often helped me to approach functions by parts; focusing first on the constraints, moving to the implementation only when I've painted a clear picture of the elements involved. I find there's a bit of ritual in deciphering constraints, not unlike the infamous spiral rule for C declarations. For me, it takes the form of describing them in a manner of plain English pseudocode. For the above it would look like this:

We have something that can be copied and compared.
It can be bitwise or'd with itself.
We are implementing over an array of it.

For more complex lists I'd sing these conditions in my head a few times until the abstract types coalesce into something more manageable, at which point it becomes possible to push the sigil soup away and focus on behaviour.

All the behaviour of these type parameters is summarized in these constraints. Nothing is implicit. That's very cool.

In this case, all that's left is a bit of iterator magic and bit wrangling. self.iter().zip(rhs.iter()).all(Condition) "zips" two iterators together in pairs, then verifies that every pair satisfies a condition. In our case the condition is the closure |(a, b)| (*a | *b) == *b. For the condition to hold, all 1 bits in element *a must also be 1 in *b.

We'll use this later to compare, as we anticipated, slices of bytes. What have we gained from doing it generically?

Even though T: Copy + Eq + BitOr<Output=T> is certainly a lot wordier than u8, it's also purer in some sense, because it more accurately summarizes what we care about. Weirdly, the shorter u8 carries a lot more extraneous, distracting baggage:
- It is 8 bits long, which is a hardware detail we don't care about.
- It is unsigned, which is a mathematical detail we don't care about.
In contrast, our platonic T is refreshingly simple: only does the things that the implementation needs. And as such, the implementation will be correct now and forever after, which leads to the next point:
When our constraints match the requirements of the implementation so tightly, we can resort to blanket implementations, knowing that future users will opt into this behaviour automatically if they define a type that matches these requirements. Crates like the impressively elegant Bevy take this idea further, using their prelude to bring blanket implementations over a wide range of types without exposing the user to the trait. This leads, I think, to very ergonomic APIs.
Not that we'd have any reason to suspect otherwise, but the concrete and generic versions produce identical assembly.
We've increased our sigil count, getting closer to summoning Kimaris, Marquis of Hell, ruler of 20 demonic legions, teacher of grammar, logic and rhetoric. Maybe he'll finally produce an intuitive definition of a monad.

Turning up the heat: generic buffer overlaps🔗

At the end of the last entry, we sketched out the constraints on our minimal concept of address and region:

pub trait Address: Ord + Copy + Add<usize, Output = Self> {}
pub trait Region<A: Address> {
    fn contains(&self, address: A) -> bool;
}

But cuervo, you ask, that's not how it looked like last week. You added an Add constraint. Okay, hypothetical attentive reader, you got me. I may have forgotten to copy that as I was curating the code examples. Yes, the core::ops::Add bound will be necessary, as we'll see in a bit. This gives us an excuse to sing the generics song, which works just as well for supertrait definitons:

An Address is something that can be sorted and copied. It can be added together with an offset, yielding another address.
A Region is something that may contain an address.

What can we build with this? We know that one of the goals of our flash driver is to smartly negotiate the different write and erase granularities, so that for any given buffer write, only the required regions are erased. In order to do this, there's a requirement we can accurately summarize in this way:

For a buffer of given length, and for a base address in the target memory map, we need to relate each buffer segment to every region it overlaps with.

Here rendered through the power of ASCII:

0    5    10   15   20   25   30   35
-----|ooooooooooooooooooooooooooooo|------ : Buffer of length 30 at address 5
----------|====|====|====|====|----------- : Four regions of length 5 at address 10

Given the two inputs above, we need to produce the following four views into the buffer:


0    5      10      15      20      25      30   35
-----|ooooo [ooooo] [ooooo] [ooooo] [ooooo] ooooo|-----
             view1   view2   view3   view4

Our goal in this section is to define such operation generically, and implement it generically, in such a way that it will work with last week's sectors, subsectors and pages from both drivers.

Now, if you're allergic to symbols, be warned: Here be dragons. Henceforth be dragon turtles, all the way down. We are about to reach ancient Egyptian shopping in Akihabara levels of symbol overload. Believe me that after some time none of them will feel superfluous or overwrought. As with human language, precise definitions lead to better understanding.

Okay, let's take a deep breath.

/// Iterator producing block-region pairs,
/// where each memory block maps to each region
pub struct OverlapIterator<'a, A, R, I>
where
    A: Address,
    R: Region<A>,
    I: Iterator<Item = R>,
{
    memory: &'a [u8],
    regions: I,
    base_address: A,
}

pub trait IterableByOverlaps<'a, A, R, I>
where
    A: Address,
    R: Region<A>,
    I: Iterator<Item = R>,
{
    fn overlaps(self, block: &'a [u8], base_address: A) -> OverlapIterator<A, R, I>;
}

Hmm. Not too bad thus far. We have the tools necessary to break down the where clause, which is identical for both the struct and the trait. We are dealing with:

Some address A.
Some region R which contains such addresses.
Some iterator I that yields such regions.

The explicit lifetime, a requisite of all structs containing references, simply denotes that the memory field is a reference to data stored elsewhere, and guarantees said data will exist at least as long as our struct.

What we've defined above is an iterator struct, and a trait for types that can be iterated in a certain way. If you come from C, iterators can be a bit of a mind bender. They feel heavy, but ultimately they're very simple. I like to think of iterator structs as bookmarks for physical books; progress trackers with just enough state to find your position in a sequence.

Looking at our iterator struct above, we see we can track our progress through the inner regions iterator, until we run out of regions. We also have a means of producing this kind of iterator—the overlaps function in the IterableByOverlaps trait—so all that's left is to define the rules of iteration. The implementation I'm going to show you can do with some manual optimization, and indeed it doesn't look exactly like this in the final Loadstone code. I went with the slightly more readable option to better illustrate the points in this article.

impl<'a, A, R, I> Iterator for OverlapIterator<'a, A, R, I>
where
    A: Address,
    R: Region<A>,
    I: Iterator<Item = R>,
{
    type Item = (&'a [u8], R, A);

    fn next(&mut self) -> Option<Self::Item> {
        while let Some(region) = self.regions.next() {
            let mut block_range = (0..self.memory.len())
                .skip_while(|index| !region.contains(self.base_address + *index))
                .take_while(|index| region.contains(self.base_address + *index));
            if let Some(start) = block_range.next() {
                let end = block_range.last().unwrap_or(start) + 1;
                return Some((&self.memory[start..end], region, self.base_address + start));
            }
        }
        None
    }
}

impl<'a, A, R, I> IterableByOverlaps<'a, A, R, I> for I
where
    A: Address,
    R: Region<A>,
    I: Iterator<Item = R>,
{
    fn overlaps(self, memory: &'a [u8], base_address: A) -> OverlapIterator<A, R, I> {
        OverlapIterator { memory, regions: self, base_address }
    }
}

Wow, it's getting real. Let's break it down piece by piece.

impl<'a, A, R, I> Iterator for OverlapIterator<'a, A, R, I>
where
    A: Address,
    R: Region<A>,
    I: Iterator<Item = R>,

So far our struct was an iterator in name only. Here we're actually implementing the core::iter::Iterator trait for it. We're already familiar with the generics here, the where clause is the same one we've seen in the iterator struct and trait definition. Moving on!

type Item = (&'a [u8], R, A);

This is the associated type we'll be iterating over. Our iterator will yield tuples of three elements:

A subslice with the already familiar lifetime 'a. That is, a view over the memory buffer we want to write.
A region R, corresponding to that view.
An address A at the start of that view.

 fn next(&mut self) -> Option<Self::Item> {
     while let Some(region) = self.regions.next() {
         // [...]
     }
     None
 }

The above is a common pattern when wrapping an internal iterator. It means that we'll greedily consume the regions iterator, finally yielding None only when we run out of regions.

let mut block_range = (0..self.memory.len())
    .skip_while(|index| !region.contains(self.base_address + *index))
    .take_while(|index| region.contains(self.base_address + *index));

Here's the naive bit, which works well enough for our exploration. We deduce the range in buffer indices for our slice by skipping all addresses outside the region, and taking all addresses within the region.

if let Some(start) = block_range.next() {
   let end = block_range.last().unwrap_or(start) + 1;
   return Some((&self.memory[start..end], region, self.base_address + start));
}

Finally, we yield a slice on those indices, with some provisioning for edge cases. Next time this next function is called, it will pick up where it left off.

impl<'a, A, R, I> IterableByOverlaps<'a, A, R, I> for I
where
    A: Address,
    R: Region<A>,
    I: Iterator<Item = R>,
{
    fn overlaps(self, memory: &'a [u8], base_address: A) -> OverlapIterator<A, R, I> {
        OverlapIterator { memory, regions: self, base_address }
    }
}

This is the part that glues it all together: a blanket implementation for any iterator over regions. This is what makes it possible to convert your maps, your zips and your chains to the iterator of tuples we care about. All this functionality becomes available just from implementing the much, much simpler Region and Address traits.

There was some weeping and gnashing of teeth from my colleagues at the single-letter type parameter names. On our usual languages of choice at Bluefruit Software, we always prefer descriptive and thorough to terse and abbreviated. However, I believe there is a strong argument for keeping type parameters short: it highlights their flexible, incorporeal nature and draws attention to the where clause and trait bounds.

Putting the pieces together🔗

Writing a flash driver that respects bit subsets, transparently merges data and erases sectors only when required in C is pretty finicky. It often involves multiple nested loops, off-by-one-prone operations, confusion, pain, dread and hardfault. This time, we have crafted enough generic glue that tackling the write method on both our flash drivers is going to be a comparative pleasure.

Going back to last week's trait definition, we're looking to implement this function for both of our drivers:

fn write(&mut self, address: Self::Address, bytes: &[u8]) -> Result<(), Self::Error>;

We'll start with the internal MCU flash, which you'll remember has a memory map encoded as a const array of sectors of this form:

#[derive(Copy, Clone, Debug, PartialEq)]
struct Sector {
    block: Block,
    location: Address,
    size: usize,
}

In the MCU flash context, this is the concrete type for our Region trait. We get Address for free via blanket implementation, so let's manually implement Region:

impl Region<Address> for Sector {
    fn contains(&self, address: Address) -> bool {
        (self.start() <= address) && (self.end() > address)
    }
}

impl Sector {
    const fn start(&self) -> Address { self.location }
    const fn end(&self) -> Address { Address(self.start().0 + self.size as u32) }
}

That's all we need to do to have access to all the powerful iterator stuff we wrote before. The write method unfolds cleanly:

 fn write(&mut self, address: Address, bytes: &[u8]) -> Result<(), Self::Error> {
     // [..] Some boring boundary/range/alignment checks here

     for (block, sector, address) in MemoryMap::sectors().overlaps(bytes, address) {
         let merge_buffer = &mut [0u8; max_sector_size()][0..sector.size];
         let offset_into_sector = address.0.saturating_sub(sector.start().0) as usize;

         self.read(sector.start(), merge_buffer)?;
         if block.is_subset_of(&merge_buffer[offset_into_sector..sector.size]) {
             self.write_bytes(block, &sector, address)?;
         } else {
             self.erase(&sector)?;
             merge_buffer
                 .iter_mut()
                 .skip(offset_into_sector)
                 .zip(block)
                 .for_each(|(byte, input)| *byte = *input);
             self.write_bytes(merge_buffer, &sector, sector.location)?;
         }
     }

     Ok(())
}

Some clarifications and notes:

write_bytes, erase and read are very low level, chip-specific functions that handle the commands and register writes required to actually operate the flash. There's not much to gain from attempting to approach them generically, so I won't expand them here.
max_sector_size() is a const fn that loops over all sectors in the memory map, rather than a const value which could go out of sync with the memory map or be at risk of copy-paste error.
The other functions from our ReadWrite trait are a lot simpler than write, where all the fun read-write-cycle logic happens, so they don't deserve much analysis.

Different chip, not so different world🔗

Now that we've assembled our write function, let's move back to the external Micron N25Q128. Last week, we modeled this driver's memory map pretty differently, as a collection of functions returning existential iterator types. Another departure is that we don't care about sectors anymore, we care about subsectors and pages (our new erase and write granularities, respectively), which we modeled like this:

pub struct Subsector(usize);
pub struct Page(usize);

Well, fret not! They may look very different, but deep down they're a Region waiting to happen. Let's make it so:

impl Region<Address> for Subsector {
    fn contains(&self, address: Address) -> bool {
        let start = Address((SUBSECTOR_SIZE * self.0) as u32);
        (address >= start) && (address < start + SUBSECTOR_SIZE)
    }
}

impl Region<Address> for Page {
    fn contains(&self, address: Address) -> bool {
        let start = Address((PAGE_SIZE * self.0) as u32);
        (address >= start) && (address < start + PAGE_SIZE)
    }
}

Once again, that implementation is all that's needed to unlock our generic powers. With this, we can finally approach the write method:

 fn write(&mut self, address: Address, bytes: &[u8]) -> Result<(), Self::Error> {
     for (block, subsector, address) in MemoryMap::subsectors().overlaps(bytes, address) {
         let offset_into_subsector = address.0.saturating_sub(subsector.location().0) as usize;
         let mut merge_buffer = [0x00u8; SUBSECTOR_SIZE];
         self.read(subsector.location(), &mut merge_buffer)?;
         if block.is_subset_of(&merge_buffer[offset_into_subsector..]) {
             for (block, page, address) in subsector.pages().overlaps(block, address) {
                 self.write_page(&page, block, address)?;
             }
         } else {
             self.erase_subsector(&subsector)?;
             merge_buffer
                 .iter_mut()
                 .skip(offset_into_subsector)
                 .zip(block)
                 .for_each(|(byte, input)| *byte = *input);
             for (block, page, address) in
                 subsector.pages().overlaps(&merge_buffer, subsector.location())
             {
                 self.write_page(&page, block, address)?;
             }
         }
     }

     Ok(())
 }

Notes:

As you can see, this method looks a lot like the one for our MCU flash. However, there are a few differences. This time our write granularity is different to our erase granularity, so rather than writing subsectors directly, there is an inner iteration of the form for (block, page, address) in subsector.pages().overlaps(block, address).
Again, write_page, erase_subsector and read are low level, chip specific functions (in this case sending a few commands over QSPI), beyond the scope of this series.

Here we see the real benefit of a concrete/generic mixed approach. We're seamlessly weaving chip specific logic—the subsector.pages() cascading memory map style—with our generic overlaps function, without sacrificing expressiveness in either side.

That's it. We've done it. We've fought demons, dragons, turtles, increased our tolerance for brackets of multiple shapes, and grown a little bit more generic. I hope we came out of this with more sanity than we've lost.

I realize I left many details out. This is, after all, a tiny slice of a much bigger project, and I unapologetically ignored module visibility, dozens of helper functions, and the justification for many small decisions. I also suspect—or rather, I know—that I've done many things suboptimally. There's always so much to learn, so let me know in any of the links at the end! Speaking of which, here's a little appendix covering some of the questions I received after last entry:

Questions from last week🔗

If you get the chance, can you include a post on your environment and tooling? It's been a while since I dipped my toes into embedded development, and I'm curious what the process and development environment for embedded rust looks like. - /u/ricree

In case my references to demon summoning haven't made it abundantly clear I'm a bit of a nerd, so I like to stick to neovim with coc-rust-analyzer for editing (works amazingly), and raw command line gdb for debugging with an SVD reader plugin. The debugging experience is not ideal, so I'm working on my own rust TUI debugger called beak. I'll talk about it more in the future, but here's a sneak peek into what it looks like (if the embed looks too small, try following this link):

Can you add RSS? - multiple people

I kid you not, last week I had no idea what RSS was. Thankfully uncle Google did, so now I have a global RSS feed (linked at the bottom of the main page) and a per-category RSS feed in case you want to filter out any awful fiction I write in the future. You can find the rust feed under the categories/rust section.

Why not make the KB! macro a const fn instead? - multiple people

It did start that way, actually! I ended up making it a macro as I wanted to throw in literals of various types without worrying about conversions, i.e:

let bytes = KB!(100i32);
let bytes = KB!(100u32);
let bytes = KB!(100i64);

Thanks for reading!🔗

Again, thanks for sticking around and for your encouraging comments. I'll come back with more Rust programming when I find something exciting to talk about. Thanks especially to my colleagues at Bluefruit for sharing my enthusiasm about Rust. As always, I intend this to be a conversation so I'll be happy to answer your comments on the reddit thread and over email.

Happy rusting!