The Canonical Customer

The next decade of B2B software will be built on agents reasoning about customers.

Walk into any Series B SaaS company this quarter and you'll find the evidence. Onboarding agents. Churn-risk agents. Renewal copilots. MCP connections wired into Salesforce and Intercom. A CS lead asking Claude questions about her book on a Tuesday morning because nothing in her CRM can answer them. Most of these companies are shipping their second or third customer-facing agent this year, faster than anyone is redesigning the data underneath them.

None of those agents agree on who the customer is.

Ask three of them who Acme Inc is and you'll get three answers. The sales agent says 500 seats. The CS agent says 800. Finance is billing for 650. A CSM catches it on a Tuesday, prepping a QBR, when the number on slide four doesn't match the number she's been telling the customer for three months. She spends the afternoon reconciling agents instead of reconciling tools. The human API never went away. The tool sprawl just gave it a harder job.

This is what sits under most agent programs right now. Every agent, plugged into every tool, learns a different version of the same customer, and none of them know they disagree. It breaks most teams' second agent and kills their third.

The agents aren't the problem. There's no shared definition of the customer underneath them.
That shared definition has a name. We call it the canonical customer.

The term, plainly

A canonical customer is the single, resolved, structured record of a customer that every agent in your company reads from. One Acme. Resolved across CRM, support, product analytics, billing, and whatever the CS team is writing into Notion this week. One schema the agents agree on. One place the reconciliation happens, once, so it doesn't happen inside every agent's context window separately.

It's a boring definition. It has to be. The interesting part is what becomes possible once you have one, and what keeps failing until you do.

Not the warehouse. Not the CDP.

The instinct at this point is that we've seen this movie. Move the data into Snowflake. Stand up a data lake. Unify the sources, point the agents at the warehouse, call it done.

It isn't done. Warehouses were built for analysts and dashboards. The reader is a human running a query, patient, tolerant of a field being a day stale, comfortable joining three tables to answer one question. An agent reasoning across support, usage, and revenue isn't that reader. It needs the resolution already done, the schema already agreed, the answer shaped for the question it's actually asking, at the moment it asks. A warehouse can sit underneath a canonical customer. It is not one.

The other instinct is the CDP. Segment, mParticle, RudderStack solved a version of this for analytics and marketing activation ten years ago. The output was an event stream and an audience, feeding Mixpanel, feeding Braze, feeding a lookalike model. Those consumers are tolerant too. A campaign can live with 2% of emails going to the wrong segment. An agent cannot live with the CSM's renewal playbook going to the wrong account because the identity resolution was approximate.

Agents need the same entity, the same schema, the same version of every field, at the moment they reason. With provenance, so when the agent says "Acme's usage dropped 40% after the March release" the claim can be traced back to the event table and believed or not. Different read patterns, different freshness requirements, different correctness bar.

A warehouse can be upstream. A CDP can be upstream. Neither is the thing.

Why every agent invents its own Acme

Plug an agent into Salesforce and it learns Salesforce's version of Acme: the opportunity, the contact records, the ARR field someone set three quarters ago and stopped updating. Plug the next agent into Intercom and it learns Intercom's version: the workspace, the conversations, the user emails that don't match the Salesforce contact emails because half the buyers signed up with personal Gmail addresses during the trial.

Both agents believe they know the customer. Neither does. Each one is holding a slice and mistaking it for the whole.

It compounds as you add agents. By the third one, you're not building intelligence. You're building a distributed system where every node has a different opinion about your biggest account, and the reconciliation logic lives in a Slack thread between two humans who joined six months apart.

The quiet cost is trust. Every time an agent outputs a number that doesn't match another agent's number, the team learns to check both. Once they're checking both, they might as well pull the data themselves. The whole point of the agent, which was to stop pulling the data themselves, is gone.

What a canonical customer actually contains

Four things, in order of how often teams underestimate them.

Resolved identity. One Acme, not seven. This means the workspace in Intercom, the account in Salesforce, the billing customer in Stripe, and the cohort in Mixpanel all resolve to the same entity. The work of resolution is mostly joins, some fuzzy matching, and a lot of rules that only make sense once you've watched them fail on a specific customer who signed their MSA under a parent company name.

Structured interactions. Support tickets, usage events, emails, meetings, contract changes, product feedback, all typed, timestamped, attributed to the resolved entity. The agent doesn't need the raw Gong transcript. It needs the turn-by-turn structure that lets it say "the champion mentioned procurement was blocked on July 12" and point at the span.

Enforced schema. Every agent writes and reads the same fields the same way. MRR is a number, not a string. Contract end date is a date, not "sometime in Q3." When the schema changes, it changes once. Every agent inherits the change. This is where most in-house builds quietly break: the schema exists in three engineers' heads and two Notion docs.

Provenance on every field. Every value points back to the source event or record it came from. An agent saying "Acme has 800 seats" should be able to show you the Stripe subscription line, the Salesforce override, and which one it chose to believe and why. Without provenance, agent outputs aren't claims. They're vibes.

Why it has to come before agent #2

The most expensive mistake in agent deployments right now is sequencing. Teams ship agent #1 on whatever data is nearest. It works. They ship agent #2 the same way. Then #3. Somewhere around #3, the math on agents stops working, and the team can't figure out why, because each agent in isolation still looks fine.

What broke wasn't the agents. It was the absence of shared ground. Every agent reinvented customer resolution, badly, inside its own integration layer. The cost to fix this after the fact is not the cost of building a canonical customer. It is the cost of building a canonical customer plus the cost of retrofitting every agent you already shipped, plus the cost of whatever decisions were made on bad numbers in between.

The ones who get this right build the canonical customer first and treat it as infrastructure. One layer. Read by every agent, ours or theirs, through the same interface. When a new agent ships, it inherits everything the previous ones already knew about Acme. The second agent costs a week instead of a quarter. The fifth one costs an afternoon.

Segment did this for analytics. Snowflake did it for the warehouse. dbt did it for transformation. Each one built the boring layer once and let the rest of the stack compound on top. The agent layer is the next one, and it's happening this cycle whether the incumbents show up for it or not.

The bet

In 2028, the companies running fifty customer-facing agents across their org won't be the ones with the best prompts. They'll be the ones whose agents all agree on what a customer is. The prompt is a weekend. The canonical customer is a year, if you start now.

"Customer 360" meant one thing when the reader was a human looking at a dashboard. In the agent era the reader is different and the question is different. It isn't "how is this account trending." It's "should I email this account today, and what should I say." The canonical customer is what a 360 view has to become when the reader is an agent and the stakes are a decision, not a slide. One record. Resolved, structured, schema-enforced, with provenance.

If your stack has three agents and three Acmes, you already know what the next year of work is. The only question is whether you do it on purpose or you do it by accident, one agent at a time, until the CSM who used to be the human API becomes the agent API, and nobody is happier than before.

We'd rather do it on purpose.

The Canonical Customer

The term, plainly

Not the warehouse. Not the CDP.

Why every agent invents its own Acme

What a canonical customer actually contains

Why it has to come before agent #2

The bet

Continue reading

Why Your Customer Data Is Scattered (And What It's Costing You)

The Rise Of The Full-Stack Employee

How AI Companies Get Their First Customers & Grow So Fast