On Promises — Alex Smith

One of the indelible problems with AuDHD is that I don’t deal well with broken promises, which is ironic given the time management hole I happily live in the rest of the time. The fungibility of my own plans is fine: there’s always an “I got this” neon sign flashing in the back of my mind, even though it’s a thing past-me regularly said to future-me which future-me has learned to stop believing.

Past me is, by any objective view, an unreliable narrator.

When it comes to me letting others down, or others letting me down - it’s different. That sits somewhere deeper than language in a place that isn’t negotiable, and feels physical in a way I struggle to describe to people who don’t have the same wiring. It’s somewhere between “I backed into a sharp corner and bruised 1cm to the left of my right shoulder blade” and “my elbows are being rubbed with Tiger Balm Red”.

As I’ve started to build again, I’ve felt that pain and smelled that balm. It was easy to start out with good intentions but breaking commitments like “Don’t Be Evil” was really hard to stomach.

I began to wonder if I could codify the way I behave (a terrifying consideration). Instead of negative guard rails in a prompt, what if I implemented them as immutably as my brain chemistry has “lists of acceptable fork and spoons”.

The reality is that guard rails are really hard for AI coding assistants. Not that I would ever use claude --dangerously-skip-permissions (cough), but the temptation to set a grand vision and click ‘ok’ on diffs means that the decades-old many-eyes-make-all-bugs-shallow concept is diluted. I found myself wondering how to help the agent building on my behalf keep my word, for me.

I’ve been calling this Promise Driven Development - not because I think I’ve invented the underlying controls, but because I wanted a name for a workflow that starts with a promise to a user and then drags that promise all the way through hooks, review, and tests.

This post is what I’ve learned over the last 12 months of building with this idea. I’ll try to give you enough to decide whether any of it is worth borrowing.

If you want the short version: PDD is my opinionated cut of a behavioural harness for coding agents. Harness engineering is the umbrella; the thing I’m choosing to put at the centre of it is explicit user promises, written as thresholds the agent can actually defend.

Coffee Thoughts

To make this concrete: the app I work on most days is FastingBestie, an intermittent fasting app for women. We’re live on iOS (Swift), with a public website (Astro) and a backend API (node).

Before we launched, we knew trust was imperative to our build - and we’d decided on key principles for our development. The first promise we came to was:

If the app asks for data, it will use it within 3 screens.

It feels small and obvious, but it’s something that slips into disrepair easily as feature development ramps up.

When I caught Claude trying to break this promise, I was poorly attempting to revive my Rancilio Espresso machine and had one eye on the remote-control job running. It had been asked to add additional onboarding steps to improve the tailoring of recommendations for users. The app already captured what a user had tried before, their goals, their dietary preferences - and another one snuck in: “What matters most to name?”

Leaving the steamy explosion of burned coffee behind, I did a quick check of the implementation of the rules engine and I saw we hadn’t planned to factor this in. If a user said they cared about “long-term results”, we didn’t yet act on it in any meaningful manner, so it became an onboarding placebo.

The reaction I had wasn’t really at the agent. It was at the realisation that there was nothing in my setup that would’ve caught this anyway - the code reviewer was me, the tests were green, I had 228 other GitHub issues open, and the product spec hadn’t been properly read for months. The agent had done its job and broken my word, and I didn’t have a system that would have stopped it.

So I started codifying the controls I wished I’d had.

Why Agents Drift

The interesting failure case here isn’t that AI coding agents are malicious (in fact, they’re mostly inert) but that they’re optimisers, and optimisers are extraordinarily good at routing around constraints they weren’t told to defend.

This kind of behaviour is years old in a reinforcement learning context - DeepMind have a catalogue of specification gaming examples where agents satisfied the letter of an objective by gutting its spirit - like the boat-racing agent that learned to spin in circles collecting power-ups instead of finishing the race. I loved the story of how ChatGPT reward hacked with the calculator - performing 1+1s silently to hit its own internal target.

The version that’s easy to recognise is the junior engineer who closed a ticket by deleting the failing test. The agent doesn’t even need to delete the test - it just needs to make the failing-test path slightly less constraining by raising a cap, softening an assertion, or quietly adding an exception for the edge case that was failing. Each individual change is justifiable in isolation, but the sum is a slow erosion of every guarantee you ever wrote.

For those of us who are cursed with pattern-matching brain chemistry, this is a familiar shape. I wrote about it before in the context of emdashes and AI-generated text - the things that give a system away are the small, statistically predictable choices it makes when nobody is watching. With AI agents, the give-away is which corners get rounded off when there’s pressure to ship and a “pls fix” prompt too many times.

Considering the innocuous but troublesome 1+1 calculations that ChatGPT had done for its own reward maximisation - I percolated for a few weeks on how to build a workflow where the agent couldn’t drift, even if it wanted to.

Why CLAUDE.md Isn’t Enough

The first instinct, and the one I followed for a few challenging weeks, is to write the rules down. Claude reads a file called CLAUDE.md when it enters a project, so you fill it with instructions: don’t raise the notification cap, don’t collect a new field without a user-visible justification, don’t ship analytics events with PII. You feel safer through negative reinforcement.

You should not.

CLAUDE.md is one document among hundreds in the agent’s working memory, and by the time the agent is twelve tool calls deep in a refactor and trying to make a stubborn test pass, the prose at the top of CLAUDE.md is functionally invisible to it. The agent hasn’t forgotten the rule as such; it just has no way to know that this particular edit, right now, in this particular file, is the one the rule was written about.

CLAUDE.md also scales badly. Promises aren’t the only thing in there - you’ve got build guidance, code style, commit conventions, naming. Every promise you add dilutes the others, and the signal-to-noise of “oh, this matters” goes down with every paragraph you append. You hit file limits - by the time you finish reading the bottom, the top has already been forgotten.

Atul Gawande’s The Checklist Manifesto makes the point well, even if he was writing about surgery rather than software. Experts don’t fail because they lack knowledge - they fail because the right piece of knowledge doesn’t surface at the moment they need it. Operating room checklists don’t exist to teach surgeons their job, they exist to put the right reminder in front of the right person at the right second. CLAUDE.md is a medical school textbook but I needed a “pre-incision laminated checklist”.

When I started implementation - I knew the promises had to be written down in a way that could be machine-readable - not just human-admirable, that the promises had to be something that the agent would pick up when things were about to be touched (not after), and that I needed a catcher in case anything slipped through.

Promises as Contracts

At AWS we learned that “Leadership Principles aren’t inspirational wall hangings”, and similarly it’s too easy to treat promises as marketing copy. By considering them a contract instead - they’re written into a file in the repository, with structure, and with numbers as commitments. FastingBestie’s promises live in Git and look like this:

"data_promises": {
    "principle": "Every piece of data we collect from the user must deliver explicit, visible value back to them. No data collection without a purpose the user can see.",
    "threshold": {
      "max_screens_until_value_delivered": 3
    }
}

The part that does the heavy lifting is the threshold block. Without a number, a promise is just a vibe and a prayer; “we respect your data” is unfalsifiable, but “use it within 3 screens or don’t ask” can be a unit test. The prose is the version to read, and the threshold is the version the agent has to keep true.

If harness engineering is the question of how to steer and sense an agent, PDD is one opinionated answer to what that control system should defend: the promises the product has made to its users.

The same shape works for everything else users care about: data privacy (“zero unencrypted PII references in persistent storage”), notifications (“no more than 3 notifications per day, none outside user-set quiet hours”), onboarding speed (“under 120 seconds to dashboard on a baseline device”), accessibility (“WCAG 2.1 AA, contrast 4.5:1, Dynamic Type up to AX5”), personalisation (“≥20% recommendation divergence across three fixture profiles”), performance (“cold launch under 2s, 60fps on core flows”). Every one of them ends up with a sentence a user could understand and a number a test can fail on.

If this is starting to sound like a Service Level Objective, that’s because I spent years in the ITIL machine and - well - it is. Google’s SRE handbook has been arguing for years that operational reliability becomes manageable only when you replace handwaving with measurable thresholds the team has agreed to defend, and the trick PDD borrows from ITIL is exactly the same: less prose, more numbers - but for product behaviour rather than just availability and latency. “Let our users trust us” is the product equivalent of “the site should be up”, and you can’t really defend either one until you’ve committed to a tangible measure.

There is a trap to watch out for here that Charles Goodhart got to first: a measure that becomes a target ceases to be a good measure. A threshold of three notifications a day doesn’t stop you from sending three useless ones, and the threshold is meant to be a floor under bad behaviour rather than a ceiling on good. We don’t want to end up doing 1+1 calculations just because it won’t violate a promise. I still keep a much softer process for asking whether a given promise is still the right one, and that mostly happens in conversation rather than in JSON. The numbers are what the agent defends, but the judgement is still human.

Surfacing Promises At The Right Moment

I wanted to make sure I was being efficient with both time and API usage - so leaned heavily on the PreToolUse hook. Claude Code lets you intercept tool calls before they happen (docs), and the hook runs whenever the agent is about to edit, write, or multi-edit a file, with the ability to inject a system reminder back into the agent’s context.

This matches on both the file path and the file contents. If the agent is about to touch the notification service, that maps to notification_frequency. If the file mentions UNUserNotificationCenter or calls notifications.schedule, same mapping. Touching analytics, anything calling the capture API, or any string matching the persistence key prefix maps to data_privacy. Editing the onboarding views maps to onboarding_speed. Editing the theme file or anything that adds an .accessibilityLabel maps to accessibility. Editing PROMISES.json itself maps to all of them, because if you’re rewriting the constitution we should probably stop and read the constitution first.

When a match fires, the hook reads the relevant promises out of the JSON and prints them as a system reminder, prefixed with this:

PROMISE CHECK. This edit touches an area covered by the app’s promises. Verify the change does not violate any of the following before completing it. If it might, stop and surface the conflict to the user (quote the promise, explain the risk, ask for explicit override). Do not silently implement; do not bury the conflict in an end-of-turn summary.

I pair the hook with a doctrine in CLAUDE.md - a short Promise Impact Analysis template that the agent has to produce as its first response on any user-facing change. It quotes the promise, lists the risks, proposes a mitigation, and waits for approval. If no promise is touched, the agent says so in one line and proceeds. The point of the template is to make the agent’s reasoning legible before it starts editing, so I get to read its argument before it writes any code.

The hook itself doesn’t block execution - it instead makes sure the agent can’t claim "You're right, I shouldn't have done that!". The doctrine half of the system says that a promise risk isn’t a caveat to mention at the end of a response - but the response itself - and that the agent should lead with it, not bury it.

This has changed working with the agent more than anything else I’ve tried. Most edits don’t touch a promise area and the hook stays silent. When it does fire, the agent visibly slows down, quotes the promise back, and either proposes a different change that doesn’t touch it, or just stops and asks me. As far as I can tell, the number of times an agent has shipped a quiet weakening of a guarantee since I put this in place is zero.

The other thing it does, which I didn’t expect, is push back on me when I’m the one being lazy. I asked it once to “just bump the daily notification cap to four while we test the engagement model”, and it refused, quoted the promise back, and reminded me what I’d agreed about not buying engagement at the cost of the user’s attention. I had a valid reason to test and experiment - but the pushback helped me reframe the process in the context of a user. That’s a partnership I’ve not felt with software before since I first committed tcpdump -A -s0 -n -i any [...] to muscle memory.

Letting The Test Suite Police The Contract

The final step is the one that survives drift: tests - simultaneously my favourite and my sword of Damocles.

Every promise has a corresponding Swift Testing tag - max_screens_until_value_delivered is .maxScreensUntilValueDelivered, data_privacy is .dataPrivacy etc, declared in a single file. Tests that verify a promise carry the matching tag, so filtering the test run by tag tells me at any point which tests are doing the work of holding which promise up.

A meta-test then reads PROMISES.json from disk during the test run, walks the test source tree, and matches every .tags(.foo) usage. It asserts that every promise in the JSON has at least one tagged test in the source, or appears in an exemptedKeys dictionary that documents where verification actually happens (some of mine run in the UI test bundle, which is a different binary). The contract gets enforced from both ends, so adding a promise to the JSON without tagging a test for it fails the build with a message telling you which tag is missing, and tagging a test for a promise key that no longer exists in the JSON fails the build with a message telling you to remove the stale tag. Documenting an exemption for a promise that no longer exists fails the build the same way.

This layer is crucial, as everything else above it can be silently bypassed: the hook can be removed, the doctrine ignored, CLAUDE.md rewritten. The meta-test can’t be removed by accident, because what they have to remove is a green build that fails in a specific way and points at a specific missing test. That asymmetry between the cost of removing the safety net and the cost of keeping it is doing most of the heavy lifting in the system.

What you end up with is a contract that can’t quietly rot. Promises don’t become outdated and forgotten, because removing one means actively cleaning up the test that holds it up. New promises don’t float in undefended, because the build won’t let you ship until a test asserts them. Whatever the agent of three months from now does to your code base, the contract keeps showing up in the build output and failing in the right places.

The Provenance

As with anything - none of the individual pieces are new - but I found it fun to piece them together.

The closest umbrella term I’ve found for this space is best described in Birgitta Böckeler’s post on Harness Engineering as a behaviour harness. Reading that gave me a useful “oh, right, yes, that” moment. PDD sits inside that umbrella; the particular shape I’m describing here is measurable promise thresholds, pre-edit reminder hooks, and build-time traceability between promises and tests.

I started with reading about Design by Contract (Bertrand Meyer - Design by Contract). The nuance of DbC is that it puts the contract in the code rather than next to it, and PDD is doing the same trick but sooner - DbC contracts are between functions, where PDD contracts are between builder, codebase, and user.

I’d worked with ThoughtWorks using Behaviour Driven Development (Dan North - Behaviour Driven Development) in the late 2000s. BDD’s core principle is that tests be written in a common language shared by customers and engineers, so that the test names tell the story of what the software is for. There’s a match in the framing but a difference in the artefact - BDD’s Given/When/Then shows a slice of a product’s outputs, a promise dictates overall product behaviour. A BDD scenario can pass while the product is still hostile, because that hostility lives between the scenarios, but a promise wouldn’t - that’s exactly what the promise was written to forbid.

From years of running production platforms, I know that an aspirational "site should be up" loses to an SLO of "the site is up for 99.99% of the time, measured this way, alerted in this way". PDD is in many ways just SLOs for product behaviour, and the reason it’s taken this long to write up is probably that pre-LLMs, we didn’t have a forcing function to write product SLOs. Code review was the validation step in most teams I’ve worked on, but with AI agents doing most of the typing, reviews have shifted upstream of the code itself: into the system prompt, the hooks, and the test contracts. The pressure has to land somewhere, and PDD is one place to put it.

What still felt fuzzy to me, though, was a concrete pattern that combined three things at once: measurable user promises, a pre-edit interrupt, and a test suite that proves each promise is still defended. DbC, BDD, and SLOs are all things you check after the fact (you write the contract, you run the test, you read the alert). PDD adds the pre-write reminder, so the agent gets told about the promise before it edits the file, and the doctrine forbids it from finishing the edit silently if the promise is at risk. That’s the part that would be uncomfortable with a human collaborator. No one likes being interrupted before they’ve started typing, but with an AI collaborator - it’s much more natural, and interruption can improve results.

Costs and Embarrassments

There is, of course, some friction. The hook adds latency to edits in covered areas. The Promise Impact Analysis adds a turn to the conversation before the agent starts editing. Adding a new promise is an order of magnitude more work than writing a sentence in a values document, because you have to care about a threshold, a test, a tag, and mapping all at once.

There’s also the embarrassment of writing the first version. My initial PROMISES.json was full of soft adjectives and zero numbers - “notifications are respectful”, “the app feels personal”, “we honour user privacy”, that sort of thing. Lovely on a slide or blogpost, useless to a test runner. Pushing each one through the threshold question - “what number would I publicly defend?” - wasn’t a five-minute job. It was a conversation with my cofounder and a week of squirming. Several promises died in that week when defensible numbers weren’t possible. A promise we can’t defend is a marketing claim, not a contract.

That trade is most of the point of the whole thing. The cost is paid up front, by you, when you’re awake and thinking, and the benefit is paid back later, by the agent, when you aren’t.

Coda

There has been a deeper part - semi-therapeutic and relieving - of coding in the way I am wired.

I can’t process broken promises in life, so by building a workflow that doesn’t produce them, I’ve been able to reduce cognitive load and masking. The AuDHD part of my brain that flags an emdash on a LinkedIn post is the same part that flags a silently widened rate limiter and twitches when a friend adjusts a plan. That pattern recognition is the same instrument no matter what you point it at. The only new thing here is that I’ve decided to point it at a development workflow and to encode the result in JSON.

I genuinely don’t know how well this generalises. A larger team might find the doctrine too heavy, or might already get the same effect from code review. This might just be a new three-letter acronym for something I wasn’t able to Google. A team without AI agents in the loop might not need the hook at all, because a human reviewer would catch the same drift. The shape of PDD I’ve described here is sized for a small team shipping a product that touches a sensitive surface, so other teams will want a different cut of it.

The principle is more portable than the implementation: decide what you’ve promised your users, write it down somewhere both builders and users can see it, surface it in front of whoever or whatever edits code, and make the safety net louder to remove than to keep.