We talk about Score, Compose, Rehearse, Perform as a cycle. It’s printed on the website. We say it in Score sessions. The clients we work with eventually start saying it back to us. What we don’t talk about as often is what the cycle actually feels like from the inside, which is less linear than the four-step framing suggests.
Here’s the version we tell each other in the office.
Score is humbling
You go in expecting to find sophisticated processes that need clever AI agents. You usually find one or two beautiful workflows that already work, six things that would benefit more from a better spreadsheet than from an LLM, and one or two genuine opportunities where an agent would make a real difference. The honest Score reports get shorter every month we do this work. We’ve had clients open a Score expecting twenty recommendations and walk away with three. The three were the right three.
Compose is the moment everything starts looking expensive
This is when the agent isn’t a slide anymore, it’s an architecture. Integration points become real. Authentication flows become real. The compliance team needs to be looped in. The vendor of the system you need to integrate with starts becoming a project of its own. Most projects that die quietly die in Compose, in the gap between “we should do this” and “here’s exactly what we’re doing.” A good Compose phase is brutal about scope and forgiving about ambition. We try to keep the first agent small enough to ship in eight weeks and large enough that it actually changes someone’s day.
Rehearse is where the demo happens, and then where reality happens
Everyone loves the rehearsal. The model performs beautifully on curated test cases. Then we point it at three months of real production data and the real failure modes appear. The data is messier than the test set. The edge cases are more numerous and weirder than expected. There’s always at least one thing the team forgot to mention because it’s so normal to them they didn’t think of it. (Last month: a client whose entire vendor onboarding workflow depends on a Word document a partner emails them every Monday. Nobody had mentioned it. The agent had no idea Word documents were involved at all.)
Perform is where it stops being our project
We’re still there: monitoring, tuning, on-call for the first month or two. But the agent is the team’s now. They name it. They develop opinions about it. They protect it from internal politics that we can’t navigate. The handover isn’t a handshake; it’s a slow trust transfer that takes longer than the build did.
And then the cycle starts again. Each Performance surfaces new dissonance. The team that lives with the production agent for a few months starts seeing things they didn’t see before. The next Score is a deeper one because they’ve already bought in. The next Compose is faster because the architecture exists. The next Rehearse benefits from production data the previous Perform generated.
The version of this we don’t say out loud is that we have favourite parts of the cycle and we’ve had to learn to do all four well. Score is intellectual. Compose is design. Rehearse is craft. Perform is care. The temperament for each is different. The teams that try to do this work with one personality type get good at one quadrant and bad at the others.
If you’re considering an AI program in your business, the question isn’t “do we have someone who can build the agent.” It’s “do we have, or can we hire, the four temperaments.” Most SMBs don’t, internally. That’s what we exist to bring. But the day a client tells us they’ve absorbed all four into their own team is the day we know we’ve actually shipped.