Ossature can now catch code that compiles but is wrong

Ossature generates code from specs. You write what a piece of software should do, it turns that into a plan you can review, then it generates the code with an LLM and runs a verify command after each step. Until 0.1, that verify command often just compiled the code, which tells you the code builds, not that it does what you asked. 0.1 adds a way to write down how the code should behave, and a step that checks the generated code against it.

A real example: I was testing it on a small Rust clone of yes, the command that prints a line to stdout over and over until the pipe closes. The architecture defines the entry point rules like this:

**Contracts:**

- main returns ExitCode::SUCCESS (0) on normal termination or broken pipe
- main returns ExitCode::FAILURE (1) on non-broken-pipe IO errors

Those are contracts, new in 0.1. An AMD component can list short rules like these next to its function signatures and they define what the code has to do, which a signature alone can't.

In my test, the generated entry point for the program compiled fine and verify step (running cargo check) passed ok, but in the generated code it handled every write error the same way by printing the error and exiting 1. A broken pipe is a write error, so if the reader went away the program would exit 1, while the contract says it should exit 0.

This is where the reviewer comes in, another new thing in 0.1. After a task's code is generated and its verify step passes, the code is passed to another agent along with the spec and the contracts. The sole job of this agent is to flag exactly this kind of problem, that main broke the two exit-code contracts and should've checked for ErrorKind::BrokenPipe. The finding went back through the fixer agent, and main got fixed to return success on a broken pipe, and the re-review passed. Without the reviewer, that program exits 1 on a closed pipe.

The reviewer agent is set to run by default. A failed review goes into the fixer the same way a failed verify does, but you can turn it off with review = false under [build] in your project config.

Another fix came out of adding this, which is that a contract should belong to the task that finalizes a file, not a task that only scaffolds it. Early on, a task whose job was to create a placeholder for a file was failed for not meeting that file's contracts, and the fixer would generate the real implementation early, into a file a later task then rewrites.

Some other smaller things were added in 0.1: The planner reads the project's language and plans for it, so a Rust project gets cargo commands instead of generic ones. A static check rejects plans that try to build a file before the source it needs exists. And a task can copy files straight from the context directory with no model call, for outputs that should be reproduced exactly (like binary assets).

v0.0.5 checked that the code compiles but v0.1 also checks it against the contracts you write.

The main thing I want to get to over the next few releases is self-hosting. Self-hosting is a milestone for a compiler, building itself from its own source, so the question is whether Ossature can subdue and tame the non-deterministic nature of an LLM reliably so it can build itself from its own spec and architecture files. I'd like to try, but a couple of things have to come first. Editing one spec can't mean rebuilding all of them, and right now the interface boundary between specs only holds across separate builds, not within the build that changes a spec, so that has to change. And verify would need to run the real test suite, so that "it built" means the tests pass, not just that it compiled. The reviewer is part of why this feels closer than it did. A tool this full of specific behavior needs something checking the behavior, not just the build.