Phase 2
Data Understanding
Check what data exists, what it means, and where it is weak.
What this phase actually is
Data understanding is the audit before the build. The team inspects available sources, checks definitions, looks for missingness, and asks whether the data can answer the business question honestly.
This phase is not just technical profiling. It is also domain work: a “view” may mean a trailer start, a full episode, a household profile, or a bot-like event unless someone checks.
The useful output is a map of available evidence, known weaknesses, and gaps that must be accepted, fixed, or reframed before modeling starts.
How this looks at Bertelsmann
Try it
Data Card Explorer
Pitfalls
- Assuming a field name tells the whole story.
- Ignoring missing data because the dashboard still renders.
- Confusing data that is available today with data that would have been available at prediction time.