Skip to main content
  1. Blog/

Domain-Driven Design for Data Scientists

·1204 words·6 mins

Data scientists today sit at the intersection of business, statistics, data engineering and machine learning. As models keep getting larger and automation becomes easier, understanding why the data exists and what it represents matters more than ever. Borrowing lessons from Domain-Driven Design helps ground the work in shared meaning instead of rushing through code and metrics. This is not always feasible due to the company priorities and due to the rush of sprints. Also SMEs rarely have the time or patience to train a data scientist in the domain, so it falls on the data scientists to learn the domain and align with business. With this alignment, data scientists can see that their best models are not just accurate but consistent with the way the business actually sees the world. This is even more valid in the current rise of AI. These are a series of shared lessons that I picked up working in different domains.

Why Shared Language Matters
#

One of the most underrated ideas in data science is the importance of shared language. It is something that software engineers have talked about for years under the name Domain-Driven Design, but the same principle applies to analytics and machine learning fields. Before tuning a model or debating which architecture to use, the real value comes from aligning with domain experts and stakeholders on what the data really means. I have previously written about the shape up methodology that helped us to get better at this: Shape up .

When you get the shared vocabulary right, when you agree on what each metric, entity, and constraint represents, everything else becomes easier. You stop fighting misinterpretations and start modeling reality instead of a convenient abstraction. Many data teams face the same problem that software engineers do: how to build real understanding across people who only want to do their piece of the job. Data scientists are often caught in the middle, expected to understand the business and the technology at the same time, but without the authority to influence either side. When management treats data science like a prediction API instead of a partner, the process collapses into ticket delivery and dashboard generation. This is not a good place to be.

For data scientists, this alignment often feels difficult. Many projects begin with a vague problem statement and a messy dataset. There is pressure to deliver insights quickly, but not much incentive to slow down and build a mental model of the domain. The worst thing you can do at this point is to rush and build something together with AI. The result in such a case would be predictable in the following ways: features that do not generalize, metrics that mislead, and models that probably fail and will never see the light of production. The real cause is rarely technical, we are trained and we can deliver a model. It comes from the absence of a shared conceptual map between us and the domain people. The best work happens when data scientists act as translators of the domain, not just model builders. That means spending time with customers, product managers, domain specialists, listening to how they describe the world and what problems they actually face. I have written one journey that I executed to learn about current domain Deep dive into Shrimp domain . The trip has paid dividends in the type of models and the priorities that I took. Once that foundation is in place, the modeling becomes an implementation detail. Ideas flow naturally and trade-offs between accuracy, interpretability, and change cost are easier to explain.

When AI Replaces Communication
#

Some data scientits have turned to LLMs like Claude, ChatGPT, or Gemini to fill that gap. This kind of thinking seems to be everywhere. Management and product can fall into the same trap if synthetic LLM models are used to survey the domain instead of engaging customers. There are startups that provide these kind of fake surveys to companies. It can feel refreshing to brainstorm features or transformations with a model that is always ready to respond and please. But this is where the danger begins. The answers sound confident, but they often miss the details from the real domain. The model may help you write code faster or clean data more easily, but it cannot decide whether your assumptions make sense. Without that connection to the real world, the results can drift away from what matters.

MLOps and Production Alignment
#

Domain-Driven Design fits naturally into how mature data teams manage production systems through MLOps. Shared definitions becomes taxonomies or different variations of versioned artifacts. Data contracts act like interfaces between analytics and engineering. Model outputs mirror the same entities that the business tracks. Monitoring, retraining, and deployment become simpler because everyone knows exactly what each metric means. This alignment also helps when things go wrong. When a drift alert triggers or a dashboard looks off, the team can trace whether the issue lies in data distribution, business logic, or feature construction. Instead of endless meetings between analysts, engineers, and product managers, everyone can follow the same conceptual map to find the root cause.

Practical Advice for Teams
#

AI assistants are useful, but they should be treated as helpers, not replacements for domain experts. They are great for writing quick scripts, cleaning data, or producing test ideas, but they do not understand meaning. Treat an AI model like a clever intern who moves quickly but needs supervision. Always verify its output and assumptions. Whenever you create a new metric or feature, validate it with a domain expert or with the principles that you derive historical data. This habit alone prevents a lot of costly errors. It also helps to keep simple documentation habits. Maintain a glossary of business terms in the same repository as your notebooks or code. Make it part of the review process, not an afterthought. Ask reviewers to check whether the code and metrics still match the shared definitions. These small steps can make collaboration feel natural again.

The real craft of modern data science lies in curiosity and negotiation. It is about asking questions that uncover how the business really works, turning loose goals into measurable signals, and ensuring that everyone interprets results in the same way. Sometimes it is better to sit with a domain expert and notice the small details that explain why the data looks the way it does. My current team does this with me and I am glad that we have this bond. Domain-Driven Design, in this sense, is not a framework or methodology. It is a way of thinking that helps data scientists build models aligned with the organization’s intent, not just its datasets. The future of data science will not be shaped by who trains the biggest model or automates the most processes. It will depend on who understands their domain best and can share that understanding clearly with others. Shared language, not shared infrastructure, will separate teams that sustain value from those that burn out. The teams that slow down to think together will eventually move faster, because they are not just building models. They are building understanding, and understanding scales further than any model ever will.