You're giving feedback on the wrong layer — and what to do instead

Joanne Chang
Apr 20
7 min read

The problem isn't that your founder gives bad feedback. It's that he's giving the right input at the wrong layer.

Imagine a legal tech team building a dataset to train a contract analysis model. They're rigorous. They hire annotators, build a labeling interface, run QA passes on every batch. The schema they've designed is clean: clause type, obligation strength, party role, risk level.

Then they bring in a lawyer to review the outputs.

She looks at the results and either doesn't know where or how to begin giving feedback or starts flagging things, almost everything. Not errors exactly — the labels are consistent with the schema. But the schema itself is wrong. "Obligation strength" misses the crucial legal distinction between shall and may. "Risk level" collapses a dozen different types of exposure into a single axis. "Clause type" uses categories that feel intuitive to engineers but don't map to how lawyers actually reason about contracts.

Everything downstream — the annotations, the training runs, the model behavior — was built on top of a representation that a lawyer would never have chosen.

The team didn't exclude the lawyer because they didn't value her expertise. They just consulted her at the wrong moment. By the time she was in the room, the schema had already calcified.

This pattern repeats itself constantly in creative and technical product development, and it almost always goes unnoticed — because the people with the most relevant expertise are still in the process. They're just operating at the wrong layer.

The layers that actually matter

When a team builds something — a model, a product, a feature — there are roughly three levels of decision-making happening at different points in the process:

The schema layer (what exists, what matters) is where the fundamental vocabulary gets set. What entities does the system recognize? What dimensions are worth measuring? What distinctions will be preserved, and what will be collapsed? This layer answers the question: what is the world made of, in the context of this problem?

The implementation layer (how it's built) is where engineers work. Data pipelines, model architecture, feature logic, code structure. This is the layer that's most visible and most legible to a technical team.

The product layer (what users need) is where product managers work. User flows, prioritization, use cases, metrics that connect behavior to business outcomes.

Most teams are well-staffed at the implementation and product layers.

The schema layer is where things quietly go wrong.

What happens when the schema is wrong

In ML, the consequences are concrete. A model for visual quality can only learn what the schema gives it. If the schema defines quality as "balance: 1–5" and "visual clutter: low/medium/high," that's what the model will optimize for. Not focal hierarchy. Not the tension between density and breathing room. Not the way a strong subject-background separation creates a sense of intention.

Those concepts aren't in the schema, so they can't be in the model. The model will perform well on its own metrics and produce outputs that feel, to a trained eye, subtly but consistently wrong. The team will run more experiments, tune hyperparameters, collect more data. None of it will help. The representation space was defined without the people who actually understand the domain — and no amount of engineering fixes a shallow representation.

This is why teams with strong ML pipelines still produce models that miss the point. It's not a modeling problem. It's a schema problem.

The founder version of the same mistake

A founder with deep domain expertise — the kind of person who has spent years or decades developing intuition about how a product should feel, what makes it genuinely different, what the right primitives are — often ends up contributing to the process at the implementation phase. A feature is designed, built, handed off for review. The founder looks at it and says: this isn't right. This isn't what I meant.

And they're correct. But by that point, what can actually change?

The surface behavior can change. The UI can shift. Individual interactions can be adjusted. But the underlying architecture — the way the system thinks about the domain, what relationships it models, what transformations it considers valid — has already been decided. Someone else's assumptions have been encoded. The schema has set.

So the founder's feedback becomes lower-level than it needs to be. Not because their insight is shallow, but because the structural decisions that would allow deeper changes have already been locked in. They're pushing on outputs when what they need to push on is the representation space itself.

This is a leverage problem, not a communication problem. Late input is structurally low-leverage input — regardless of how good the input is.

In this scenario the schema question isn't "is this feature right?" It's — what is a cut, really? What makes a transition meaningful or empty? What relationships exist between pacing, sound, and emotional register that the system should understand?

Take highlight detection as an example. The engineering question is: can we scan a photo library and surface the best moments in seconds? But the schema question underneath it is: what is a "best moment," actually? Is it sharpness? A detected face? Or is it something harder to name — the photo where everyone's looking at each other instead of the camera, the one that captures the feeling of the day rather than just its events? If that definition never gets formally constructed before the model is built, the team will optimize for proxies — technically measurable signals that approximate the real thing but miss it. The founder will look at the results and know immediately that it's wrong, but won't be able to say why in terms the system can act on. That's the schema gap. That's the grammar the product needs to be built on.

The people who can answer these questions often aren't thinking of themselves as doing design work. But this is design work — arguably the most consequential design work in a technical system. It's the design of the language the system uses to perceive the world.

Why schema decisions are irreversible multipliers

The reason this matters so much is that schema decisions don't stay local. They propagate.

In ML: once a schema is chosen, data is collected according to it. Annotators develop intuitions calibrated to it. Models are trained on it. Features are built to expose what the model has learned. By the time outputs feel wrong, you're not looking at a problem — you're looking at a consequence of a decision made much earlier, now distributed across every layer of the system.

In product development: once an architecture is chosen — once the team has decided what the atomic units are, what relationships are modeled, what transformations are possible — every feature gets built on top of that. The founder's vision either fits within that vocabulary or it doesn't. If it doesn't, the fix isn't more feedback. The fix is going back to the beginning, which is usually not on the roadmap.

This is why the schema layer has a different kind of importance than the implementation or product layers. Implementation decisions can be refactored. Product decisions can be reprioritized. Schema decisions, once they've propagated downstream, are extraordinarily costly to undo.

Getting them right the first time isn't a nice-to-have. It's the highest-leverage intervention available.

What "schema input" actually looks like

The good news is that operating at the schema layer doesn't require a different kind of person — it requires a different kind of question, asked earlier in the process.

Instead of: does this feature feel right?

Ask: what are the atomic units of this domain?

Instead of: does this output look good?

Ask: what distinctions would a trained human eye notice that our current labels would collapse?

Instead of: what should this interaction do?

Ask: what relationships actually exist in this problem space, and are we modeling them?

For a designer working on a compositional AI system: the schema question isn't "rate this layout 1–5." It's — what is focal hierarchy, and how is it different from mere visual weight? What makes a crop feel intentional versus careless? When does asymmetry create tension and when does it create rhythm? Those are the distinctions that need to be in the representation before the first annotation is labeled.

For a founder with deep video intuition: the schema question isn't "is this feature right?" It's — what is a cut, really? What makes a transition meaningful or empty? What relationships exist between pacing, sound, and emotional register that the system should understand? That's the grammar the product needs to be built on.

The structural fix

None of this requires a big reorg or a new role. It requires a phase — a deliberate schema design phase that happens before implementation begins, with the right people in the room.

The output of that phase isn't code or a dataset. It's a vocabulary: a set of named distinctions, with enough precision that engineers can make them learnable and annotators can apply them consistently. It's the answer to the question: if the system were perfect, what would it need to understand?

The forcing question matters. Not: what data do we have? Not: what's easy to label? But: what mental model are we trying to encode — and who actually holds that mental model?

The answer is almost never the engineering team.

It's the designer who has spent years looking at images until she can feel when something is compositionally wrong. It's the founder who has built such a precise internal model of the domain that he can tell, immediately, when a product decision violates its grammar.

Those people need to be in the room before the schema is set. Not reviewing the outputs. Defining the vocabulary.

The shift worth making

The underlying realization here is subtle but significant. We tend to think of domain expertise as something that improves a system by catching its mistakes — a layer of review, a quality check, a correction mechanism.

But in creative and technical domains, the deepest expertise isn't most valuable as review. It's most valuable as representation. The question isn't just "is this output right?" It's "does this system even have the vocabulary to get it right?"

Most teams never ask that second question explicitly. They assume the representation is adequate and debug the implementation. They get a visionary's input on features when what they needed was their input on primitives.

The highest-leverage thing a domain expert can contribute isn't feedback on what was built. It's the vocabulary the system uses to perceive the world — and that vocabulary has to be defined before the building starts.