Data Analysis Projects: From Data to Decisions
Originally published on the Discontinuity blog, May 2016.
Data products use data science and engineering to improve product performance: better search results, recommendations, automated decisions. Decision science uses data to analyze business metrics (growth, engagement, margins, feedback) to inform strategy and key business decisions.
The difference between analytics and decision science isn’t always clear. But decision science should do more than produce reports and dashboards. Data scientists shouldn’t be doing work that can be delivered using off-the-shelf business intelligence tools.
Not every problem needs the data-science bazooka. Some decisions are too small to justify the investment. Others may be important, but the business lacks the data to meaningfully analyze them. In those cases, rely on intuition and experimentation. Good decision scientists know their own limitations.
Entry checklist
- Are you committed to using data science to either inform strategic decisions or build data products?
- Will you be able to collect the data you need, and, more importantly, act on it?
Collecting data isn’t enough. Data science only matters if data drives action. The available signal depends not only on data volume, but also on the signal-to-noise ratio. Data should inform product changes and drive the organization’s KPIs.
Build vs. buy
Building a data science team is hard and expensive. If you can get away with outsourcing your data science needs, you probably should. Use an off-the-shelf solution for your domain that ingests data, builds models, automates actions, and reports on key analytics. It’s often worth compromising to accelerate your business and keep your core team focused.
When do you need data science to be a core competency? When the problem is critical to your success or your approach is unique (collecting new kinds of data, using results in novel ways) and off-the-shelf tools are too rigid.
Build a diverse team with extremely different backgrounds, skill sets, and world views. Over time, the impact will be far higher.
Understanding ML tasks
At a high level there are common kinds of tasks in ML: classification, regression, and ranking. Classification predicts categories (e.g., image recognition: is this a photo of a dog or a car?). Regression predicts numerical values (e.g., future value of a home). Ranking predicts an ordering of items (e.g., search results most relevant for a given query and user profile).
Not having an evaluation metric is a very bad sign. Define how the system is measured on the task. A common and frustrating reality is that more complex ML technology does not necessarily mean improvements on evaluation metrics; especially with limited data, simple techniques frequently outperform complex ones.
The trickiest aspect is understanding how improvements on the ML task will impact which business metrics and by how much. That connection matters more than the technology itself.
Recommendation: after validating the MVP, consider investing in an internal data-science capability.