When Design Is the Domain: A Different Kind of AI Product Development

Joanne Chang
Apr 12
7 min read

Updated: Apr 16

What happens when the knowledge the AI needs to learn is yours

There's a version of the "designer's role in AI" conversation that's become fairly standard. Designers should be at the table during problem definition. Designers should advocate for data quality. Designers should own the prompting layer. Designers should define what good UX looks like on top of the model.

All of that is true, and it matters.

But there's a different and less-discussed scenario that changes the stakes considerably: what happens when the domain knowledge the AI needs to learn is design itself? Not design as a process. Not design as a way of centering the user. But design as a body of substantive expertise — composition, color theory, visual hierarchy, layout grammar, the principles that distinguish a harmonious arrangement from a mediocre one.

When that knowledge is what the model is being trained to understand, the designer's role transforms completely. You stop being the advocate for the experience layer and become the subject matter expert whose knowledge the entire pipeline depends on.

____________________________________________________________________________

To make this concrete, it helps to place the argument inside the full AI product design lifecycle. The six phases this post covers don't touch every cell in that lifecycle — they cluster in the places where domain expertise is most irreplaceable.

The highlighted phases concentrate in the first two rows — where domain expertise is most load-bearing. The bottom row matters, but it's where the designer's role starts to look more like general AI product practice.

____________________________________________________________________________

The problem with how we talk about design knowledge in AI

Most AI systems that touch visual or creative domains treat design quality as a proxy metric. They optimize for things that are measurable: CLIP scores, FID scores, aesthetic ratings aggregated from large populations. These metrics capture something real — but they're largely blind to the principles that practicing designers actually care about.

A generated image can score well on every standard quality metric and still have poor visual hierarchy. A layout can be technically coherent and compositionally weak. A color palette can be aesthetically pleasant in isolation and clash destructively with the other elements on the canvas.

The gap exists because the people building the evaluation infrastructure are not, in most cases, domain experts in design. They're optimizing for what they can measure, which is not the same as what matters. This creates an opening — and a responsibility.

____________________________________________________________________________

Phase one: problem definition as knowledge excavation

In the standard AI product lifecycle, problem definition is about understanding what users need and what the business is trying to solve. Those questions still matter. But when design is the domain, there's a prior and harder question underneath them: can you actually articulate what you know?

Design expertise is largely tacit. Experienced designers feel when something is off before they can explain why — not by consciously running through rules, but through internalized pattern recognition built over years of practice. That tacit knowledge is enormously valuable. It is also, in its tacit form, completely unusable for training a model.

The problem definition phase becomes an exercise in externalizing implicit expertise. You have to ask: what are the actual principles at work? What makes a layout readable? What constitutes visual balance? How do you define hierarchy in a way that isn't just "I know it when I see it"?

This is not a straightforward task. It requires moving between intuition and formalization — making a judgment call, then interrogating it, then trying to write a rule that captures it, then testing that rule against edge cases until it either holds or breaks.

But this work is the foundation everything else depends on. If you can't articulate the principles, you can't build the schema. If you can't build the schema, you can't label the data. If you can't label the data, you can't train the model.

____________________________________________________________________________

Phase two: data is only as good as the expert labeling it

In a standard AI development context, designers advocate for data quality and inclusivity. When design is the domain, the role is more fundamental than advocacy. Designers are the ones who know what good data looks like — and they may be the only ones who do.

A data scientist can tell you whether an image is high resolution or whether it contains a face. They cannot reliably tell you whether a composition has strong visual hierarchy, whether the layout respects grid principles, whether the visual weight is balanced — not without guidance from someone who has internalized these things through years of practice.

This means designers are not just reviewing data for quality. They are defining the annotation schema, training the annotators, auditing the labels for errors that would be invisible to a non-expert, and making the calls on edge cases that the schema doesn't cleanly resolve.

There's also a tension worth being honest about: design knowledge is not fully consensus-based. Two expert designers examining the same layout may reach different conclusions about whether the hierarchy is clear. That disagreement isn't a failure of expertise — it's an honest reflection of how the domain works. Both types of knowledge are real and worth capturing, but they need to be treated differently in the schema.

____________________________________________________________________________

Phase three: feasibility means setting the right bar

The standard feasibility question in AI product development is: can we perform this task at the quality users will expect, given the data and tools we have? When design is the domain, that question has to be refined: can this model learn design principles at the quality an expert designer would accept?

That is a considerably higher and more specific bar. And it requires the domain expert to be involved in setting it — because only someone with design expertise can evaluate whether the model's outputs are actually good, as opposed to merely plausible.

This is where the absence of appropriate evaluation metrics becomes a practical problem, not just a theoretical one. If your evaluation metrics can't detect compositional failures, you'll get false confidence in your results. The feasibility study, done properly, includes an honest inventory of what can currently be measured and what can't.

____________________________________________________________________________

Phase four: prompting as encoded aesthetic judgment

Notice where this phase sits in the lifecycle grid above — not in a dedicated "prompting" cell, but inside Design. That placement is the argument in miniature: prompting belongs to design, not to engineering. When the domain is composition, the prompt is where aesthetic judgment gets encoded directly into the model's operating rules.

The difference between a prompt written by a generalist and one written by someone who deeply understands visual weight, grid systems, and color harmony is not just a matter of polish. It's a matter of substance. The domain expert can specify what kinds of errors to avoid, what tradeoffs to make when principles conflict, what outputs are genuinely good versus merely acceptable.

This is not something you can delegate. A prompt that says "create a visually balanced layout" is almost useless compared to one that specifies what balance means in this context, what signals imbalance, and what the edge cases are. Only someone who has internalized the domain can write the second kind of prompt.

____________________________________________________________________________

Phase five: evaluation infrastructure is the contribution

In the standard AI lifecycle, designers help shape test plans and evaluate whether outputs are meeting user expectations. But there's an additional layer specific to design-as-domain contexts: defining what correct means is itself the research contribution.

Standard image quality metrics were not built to capture compositional principles. Building the evaluation infrastructure for those things — the metrics, the benchmarks, the labeled datasets that ground the assessments — is meaningful technical and intellectual work. It's the kind of contribution that makes subsequent research possible, not just for your own projects but for the field.

This is why a well-constructed annotated dataset with a thoughtful schema is worth publishing as a standalone contribution. It's not just training data. It's a formalization of design knowledge in a form that the research community can build on.

____________________________________________________________________________

Phase six: post-launch is where the domain expert stays involved

In most AI product contexts, post-launch monitoring focuses on behavioral drift, bias emergence, and user trust erosion. Those concerns apply here too. But when design is the domain, there's an additional dimension: the model's design judgment will degrade in ways that only a designer will catch.

A model that starts recommending layouts with weakening hierarchy, or color combinations that are technically varied but harmonically poor, or compositions that feel crowded without violating any explicit rule — these failures will not show up in standard monitoring metrics.

They require someone with design expertise looking at the outputs and noticing that something has gotten worse.

This is a case for keeping domain experts involved in ongoing evaluation well past launch, not just in the initial development cycle.

The argument, put simply

When design is the domain, the standard framing — designers as advocates, collaborators, experience owners — is necessary but not sufficient. The deeper claim is this:

The designer is not just a participant in the AI development pipeline. They are the knowledge source the pipeline depends on. Their expertise isn't needed to make the product usable. It's needed to make the model learnable at all.

This changes what design contribution looks like at every stage. It makes the knowledge externalization work — the frameworks, the schemas, the annotation guidelines, the evaluation criteria — not just useful preparation for model training, but valuable intellectual output in its own right.

The designers best positioned to contribute to this kind of work are not necessarily the ones with the most AI fluency. They're the ones with the deepest domain expertise, the clearest ability to articulate what they know, and the rigor to hold the bar at the level the domain actually demands. That combination is rare. And right now, it's exactly what this space needs.

When Design Is the Domain: A Different Kind of AI Product Development

What happens when the knowledge the AI needs to learn is yours

The problem with how we talk about design knowledge in AI

Phase one: problem definition as knowledge excavation

Phase two: data is only as good as the expert labeling it

Phase three: feasibility means setting the right bar

Phase four: prompting as encoded aesthetic judgment

Phase five: evaluation infrastructure is the contribution

Phase six: post-launch is where the domain expert stays involved

The argument, put simply

Recent Posts

Comments