What to Prove Before You Build the AI

Over the past year, we’ve been pulled into more AI conversations than I can count. Some have turned into projects, some haven’t. But almost every one of them opens with the same question: how much time and cost would it take to build something like this?

I think that’s often too early to answer properly.

Not because building doesn’t matter. But because most AI ideas have one or two assumptions buried inside them that, if wrong, make the whole thing fall apart. And nobody knows which assumptions those are until the thing is already half-built.

That’s the moment I usually pause the conversation and ask something else. What, specifically, would have to be true for this to be worth doing? And how would you know, before you commit?

A client we’ve worked with for years brought us one of those ideas. They wanted an AI assistant on their international product catalog. Industrial products, technically dense, multiple languages, multiple markets. Visitors should be able to ask questions in their own language and get accurate, specification-level answers, instead of digging through catalog PDFs.

It was a great idea on paper. It was also full of assumptions.

Could the system actually pull spec-level answers from technical PDFs accurately, or would it hallucinate when the question got specific? Would the language detection hold across all the markets they operate in? Would the answers feel local, or read like a translated brochure? Would the cost per answer still make sense once real customers started using it? And the one that mattered most — would they trust it enough to put it on their public site, in front of real customers?

They asked if we could prove some of this before committing to a full build. We knew them well. We trusted the relationship. So we said yes, we’d do a small one. On us.

In hindsight, that’s where POC Lab started.

We took a representative slice of their catalog, plugged it into a working chat interface, and tested it with real questions across languages and markets. The point wasn’t to build the product. The point was to make the next decision obvious.

Some assumptions held. Some needed work. One of them changed how we’d approach the architecture entirely. By the end, the conversation had shifted. They weren’t asking “should we build this?” anymore. They were asking “when do we start, and what should the scope be?”

The full product followed. You can see where it landed.

What stayed with me afterward wasn’t the build itself. It was how much sharper every conversation got once we had something working in the room. Stakeholders stopped debating the idea in the abstract. They started reacting to evidence. That changed everything.

There’s a part of this I don’t see talked about enough.

Before AI, when you built a digital product, the question you de-risked was: does this work? Once it shipped, the costs were mostly fixed. Servers, support, maintenance. Predictable.

AI is different. You’re not just betting that the thing works. You’re betting that it works at a per-query cost that holds up when real users find it. Inference is not free. Every conversation, every retrieval, every regeneration adds up. A feature that’s brilliant at 100 users a day can quietly become a financial drag at 10,000. And by the time you find that out, it’s already in production, already in customer hands, already part of what people expect.

A good POC surfaces both questions at once. Does the idea actually work, with real data and real users? And does it work at a cost that survives contact with real usage? You can answer both in weeks, instead of finding out after months of build.

That’s why we turned what started as a pre-sales investment into a service. Not because we wanted another offering on the page. Because we kept watching the same pattern. Teams in love with an AI idea, with no way to pressure-test it before committing real money.

POC Lab is time-boxed, hypothesis-driven, and scoped to make the next decision clear. Not a discovery phase. Not a workshop. A working artifact you can click through, with evidence next to it — including what it costs to run at scale.

If you have an AI idea sitting in a “should we?” loop, the better conversation may be what needs to be proven first.

And if nothing else, it might help you get past the part where everyone has an opinion, into the part where everyone has evidence.

What to Prove Before You Build the AI

Stay in the loop

Want help applying this to your organization?