Skip to main content
    Back to Blog

    Before You Hire an AI Consultancy, Ask for the Audit Trail

    By Zestic AI2026-05-19

    The question we hear most often from boards before they commission AI work is not "can your AI do this?" They already believe it can. The question is: "If something goes wrong, can you show us what happened?"

    Most AI firms cannot answer that question. Not fully. Not in a way that would satisfy a regulator, a non-executive director, or a client whose data was involved. They built the capability first and thought about governance second, which is the natural order of things when a technology moves fast and the pressure to ship is high. But it creates a problem that gets passed directly to the organisations that hire them.

    When you commission AI-delivered work and something goes wrong, the firm's audit gap becomes your governance gap.

    The accountability problem nobody is pricing in

    AI consultancies are proliferating. Most are talented. Many are genuinely capable. But the industry has developed a habit of treating AI agent systems the way early software teams treated source control: as something you add later, once the important work is done.

    The result is a generation of AI delivery firms that cannot tell you which agent made a specific decision, what information it was given, whether any human reviewed it before it affected a client outcome, or why two identical-looking inputs produced different outputs on different days.

    In a low-stakes context, that is a quality problem. In a regulated industry, with enterprise data, or where AI outputs are feeding real business decisions, it is a liability problem. And it belongs to whoever commissioned the work, not just whoever delivered it.

    Boards are beginning to understand this. The questions are changing. Procurement teams at serious organisations now ask AI vendors to demonstrate explainability as a contractual requirement, not a nice-to-have. Regulators across financial services, healthcare, and public sector are moving in the same direction. The gap between what most AI firms can demonstrate and what the governance environment will soon require is closing fast.

    What good AI governance looks like in delivery

    We built the AI Dark Factory with governance as an architectural principle, not an afterthought. The reason is straightforward: we deliver AI to clients in regulated industries, and we cannot afford to be the firm that cannot explain itself.

    There are three elements that matter.

    Separation of production from governance. The agents that write and build must never be the same agents that evaluate whether what was built is safe, correct, and appropriate. This sounds obvious. Very few systems actually enforce it. In our factory, it is a hard architectural rule. No agent reviews its own output. A separate governance layer makes that call, independently, every time.

    Binary verdicts. Code review in our system produces one of two outputs: go or no-go. Not "here are some thoughts." Not "this looks mostly fine." Go or no-go. Merge decisions are all-pass or none-pass. The value of this for a board is not technical. It is that there is no ambiguity about what was decided, and no room for the kind of softly-worded hedging that makes post-hoc accountability impossible. Every decision point produces a clear record.

    Full telemetry on every action. Every task our factory handles is logged with the agent that handled it, the model used, the provider called, the time taken, the routing logic that assigned it, and the outcome. That log exists not as a backup, but as the primary record. If something goes wrong on a client engagement, we can reconstruct exactly what happened, at which step, and why. That is not a feature. It is a governance model.

    TruthForge, our narrative intelligence product, makes this concrete. TruthForge processes sensitive signals about an organisation's public reputation and produces strategic communications recommendations, often in situations where the stakes are high and the timeline is short. Our clients in that context need to know exactly which signals informed a recommendation, which agents processed them, what the review chain looked like, and whether any human approved the output before it reached their desk. The governance requirements are not hypothetical. They are the product. The factory provides that chain of accountability on every run.

    The practical difference becomes clear in a specific scenario. An AI-delivered system produces an output that causes a problem: a recommendation that turns out to be wrong, a piece of generated code with a security flaw, a data summary that missed something important. Your regulator or legal team asks for a full account of how the output was produced.

    With most AI delivery firms, that conversation ends in generalities. With us, it ends in a structured log with a traceable chain from input to output, every decision point accounted for.

    Why this is a board question, not just a technical one

    We are not raising this as a commercial point. We are raising it because the regulatory trajectory is clear, and the organisations that have thought about it now will be ahead of the ones that have to retrofit it later.

    The FCA's expectations around AI explainability in financial services are hardening. ICO guidance on automated decision-making is becoming more specific. Sector-specific frameworks in healthcare, legal, and public sector procurement are all moving in the same direction. The common thread is accountability: if an AI system contributed to a decision, someone needs to be able to explain what it did and why.

    That accountability does not sit with the AI firm you hired. It sits with the board that commissioned the work.

    Non-executive directors in particular are navigating new ground here. AI is now regularly on the agenda at audit and risk committees. Most boards have approved AI programmes without asking the question that matters most: if this goes wrong in a way we cannot explain, who is on the hook? The answer, in most governance frameworks, is the organisation that deployed it. The AI firm's audit gap becomes a problem that the board owns.

    The governance gap in AI delivery is not an abstract risk. It is a specific, immediate exposure that most boards are currently underpricing.

    Three questions to ask any AI firm before you sign

    These are not technical questions. Any competent AI firm should be able to answer them in plain language for a board audience.

    Can you show us a full log of an AI agent's decision chain on a real engagement? Not a diagram. Not a slide. An actual record. If the answer is "we don't keep that level of detail," you are about to inherit their governance problem.

    What happens when two identical inputs produce different outputs? Every AI system has variance. The question is whether the firm has built a system that can detect it, flag it, and explain it. Deterministic routing, where the same task description always produces the same agent assignment regardless of model drift or update cycles, is one way to address this. Ask whether they have it.

    Who reviewed the AI's work before it affected your environment? The answer should name a specific governance layer, not a general process. If the answer is "the same model that produced the output also checked it," that is a red flag. Governance requires separation.

    The audit trail as a proxy for everything else

    We have found that the audit trail question is a useful proxy for the maturity of an AI delivery firm across the board. Firms that have built genuine governance infrastructure tend to have also built robust testing, proper cost controls, and disciplined separation of concerns throughout their systems. The audit trail does not exist by accident. It is the product of an engineering culture that takes accountability seriously from the start.

    The converse is also true. Firms that cannot answer the audit trail question usually cannot fully answer the cost question, the variance question, or the escalation question either. The governance gap is rarely isolated.

    Before you commission AI work, ask for the trail. What you learn from that single question will tell you more about the firm's readiness than any capability demonstration.

    Related Articles

    AI Strategy
    9 min

    What You're Actually Buying When You Acquire an AI Consultancy

    Most AI boutique acquisitions buy people, not machines. Here's what acquirers should ask, and why the AI Dark Factory changes what the asset really is.

    Read Article
    AI Strategy
    5 min

    Context Engineering: The Missing Discipline Your AI Architecture Needs

    Most enterprise AI failures are context failures, not model failures. Here's the missing discipline that fixes them, and what good implementation looks like.

    Read Article
    AI Strategy
    8 min

    Why Your AI Delivery Partner Should Have a Factory, Not Just a Team

    When choosing an AI consultancy, the right question is whether they have a factory or just a team. Here's what separates production-grade delivery from people-augmentation.

    Read Article
    Zestic AI logo - AI Architects for Business Transformation and Intelligent System Design

    AI-Native Architects for Ambitious Businesses

    Company

    Solutions

    Resources

    © 2024 Zestic AI. All rights reserved.