Spell It Out: Rules for an AI Coding Assistant

This is the third post in my series about working with an AI coding assistant. The first two covered getting started and early observations. This one is about what happened when I started writing down the rules.

Just another tool

An AI coding assistant is just another tool. Not very different from a spell checker or auto complete. It has a good foundation for making guesses, and the guesses are often good. But they are guesses. For someone non-technical it might look like magic, but it is pattern matching on a large scale and needs a human to tell it when the pattern is wrong.

That is why rules matter. The better you describe what you expect, the better the guesses get. Not unlike explaining something to a junior developer who doesn't know your codebase yet.

Explicit rules are still king

I knew from the start that the assistant needs explicit rules. What I didn't realize was how many rules I actually wanted. It has the mental state of a three year old. If you leave something out, it will fill in the blanks on its own. Sometimes it gets lucky. Often it doesn't.

A colleague looked at some database code the assistant had written and asked me about it. There were multiple write statements without a transaction. She also said this wasn't my regular level of quality. She was right. The assistant did exactly what I told it to do. I just hadn't told it about transactions, and I had missed it in my own reviews.

The rule I added after that is blunt: every repository method that performs more than one write must wrap all statements in a transaction. A method that issues two or more writes without a transaction is a bug. Partial writes corrupt data silently and are nearly impossible to diagnose after the fact. I also added a rule that every transaction must have a rollback test that proves the transaction actually rolls back on failure. No exceptions.

Show me the skeleton first

Before I added this rule, the assistant would write a full test and immediately write the production code to make it pass. Sometimes I had to stare at the test for a while just to figure out what it was trying to verify. That is not a good position to be in when you are supposed to be reviewing the work.

Now the assistant shows me the test as a skeleton first. Just the test name, the setup, and what it will assert, but no production code yet. I read it, decide if it makes sense, and then let it continue. It slows things down a little, but I actually understand what is being tested before any code is written.

I also ask for manual test steps when something touches the UI or an API. Simple things like "open the browser, go to this page, click this button, expect that result." Having those written down makes my life easier. I can verify the change myself without having to figure out how to exercise it.

Never trust a test you haven't seen fail

The TDD cycle moved too fast sometimes. The assistant would write a test and make it pass in one go. The tests were pretty good, but when I took too large steps I couldn't always tell if they were protecting the right thing.

So I added a rule: never trust a test you haven't seen fail. And fail for the right reason. The TDD cycle is now closer to what I would do manually. Tests are red a few more times, which is what I want.

The rules I added

Over the last month I have added quite a few rules to my assistant's configuration. Some are general, some are specific to the backend or frontend.

General rules:

Do not inline method calls as arguments
Require proper assertions in tests
Writing style rules
Commit message style rules
Never trust a test you haven't seen fail

Backend specific:

Stronger refactoring rule
Stronger wording to actually do the work and not just ask for permissions
Show the planned tests as skeletons before implementing them

Frontend specific:

Prefer blackbox testing when possible
Do not swallow exceptions
Better test name instructions

None of these are surprising if you have been writing code for a while. Most of them are the same things we used to put in coding guidelines that no one ever read. The difference is that now someone at least tries to follow them, even if it misses some.

London school, still

In my second post I mentioned that the assistant prefers to build from the bottom up while I prefer to drive the design from the outside in. That friction has not improved. The assistant still wants to start at the database and work upwards.

I'm not sure it matters much anymore. The course corrections are quick.

Resources

TDD with an AI assistant - first post in this series
AI Coding Assistant: More Observations from a Practitioner - second post
Thomas Sundberg - the author