What I Do Before a Data Science Project to Ensure Success

rw-book-cover

Metadata

Highlights

  • First, Draw The Map To The Destination (One-Pager) The map covers the intent, desired outcome, deliverable, and constraints. (View Highlight)
  • Intent (Why?) What’s the problem we’re trying to solve, or the opportunity we want to gain from? How will customers benefit? Why are we doing this, and why is it important? (View Highlight)
  • Management is doing things right; leadership is doing the right things.” - Peter Drucker (View Highlight)
  • But, by taking the time to think through the problem and intent, we might realise that, hey, maybe we don’t need to fix it after all. (View Highlight)
  • Now that we have the intent, we can discuss how success looks like. How well should we solve this problem? How do we measure it? In data science, this is usually a business metric such as conversion, savings from fraud reduction, net promoter score, etc. (View Highlight)
  • Specifying the desired outcome, in quantifiable terms, prevents us from falling into the trap of chasing a moving target. (View Highlight)
  • Solving a problem to 95% could take 3-4x the effort of solving to 90%; solving to 99% might take 10-100x more. (View Highlight)
  • we can design a deliverable that meets the intent and desired outcome. How should we solve this problem? The solution should be designed to meet the intent and desired outcome, keeping in mind the need to integrate with the existing system. (View Highlight)
  • e-commerce platform has the intent of improving how customers discover and purchase products. To achieve this, should we improve search? Or recommendations? Or email campaigns? If it’s a recommender, how will we deploy it? (View Highlight)
  • This doesn’t have to be especially detailed; for now, we don’t need the full architecture and specs. But it’s useful to have a rough sketch to get upfront buy-in from the business, product, and tech teams (View Highlight)
  • How not to solve a problem is often more important than how to solve it. Unfortunately, this doesn’t get addressed enough. (View Highlight)
  • Providing teams with boundaries and constraints counterintuitively leads to greater creativity and freedom. Without constraints, we don’t know what we cannot do. (View Highlight)
  • “Constraints drive innovation and force focus. Instead of trying to remove them, use them to your advantage.” (View Highlight)
  • We write the intent, desired outcome, deliverable, and constraints in simple language on a single-paged document. This can be shared to stakeholders for their review, feedback, and buy-in. (View Highlight)
  • This requires a certain amount of discipline from stakeholders. After all, the work comes at no cost to them. There’s nothing stopping them from changing their minds halfway through the project. (View Highlight)
  • Most projects start with a solution, then come up with estimates for each component and the overall design. (View Highlight)
  • I tend to do the opposite. Given a budget (read: time-box), how can we design a solution that fits? The intent and desired outcome determine the time-box, and the time-box determines the solution design. This is how Basecamp does it too—they have different appetite for various problems, and scope the solutions accordingly. (View Highlight)
  • The time-box will vary across the project stages. At the start, when we’re still exploring and uncertainty is high, we’ll want tighter time-boxes to limit wild goose chases. (View Highlight)
  • feasibility assessment. With our existing data and technology, are we able to solve the problem? If so, to what extent? In this stage, we aim for a quick and dirty investigation. I usually time-box this at 1-2 weeks. (View Highlight)
  • After determining feasibility, we proceed with a proof of concept (POC). In this stage, we hack together a prototype to assess if our solution is technically achievable. Ideally, we also test the integration points with upstream data providers and downstream consumers. Can we meet the technical constraints (e.g., latency, throughput)? Is model performance satisfactory? This usually takes a month or two. (View Highlight)
  • We’ll want to time-box this too. An overly generous timeline can lead to non-essential features being squeezed in and never-ending development—without actually deploying it, no one benefits from it. This usually takes 3-6 months, including infra, job orchestration, testing, monitoring, documentation, etc. (View Highlight)
  • Having the one-pager and time-box improves a project’s chances of success. Usually, it’s sufficient. Nonetheless, it can be helpful to break it down, especially if it involves unfamiliar data or technology. (View Highlight)
  • the breakdown indicates which components are harder to implement or more at risk—these usually involve things we’ve not done before. We want to front-load the risk and start with these scary bits first (View Highlight)
  • When breaking it down, I often consult seniors with more expertise and experience. They usually have better intuition on potential gotchas and blockers that deserve more attention. (View Highlight)