Friday, April 27, 2018

Setting Up Data Science Projects

At a high level, a data science project has six components in three classes:
  1. Objective 
    1. What is the intended benefit?
    2. How is the problem going to be solved?
  2. Data
    1.  What is the data will be output?
    2. What is the data that will be coming in?
  3. Technical
    1. What will be the analytic approach?
    2. What code will be written?
These aren't steps. Projects often go back and forth between the various components. For instance, we'll realize that the code resources we have available won't make the solution we want possible, so we go all the way back to component 1.2 to see if there is a different way of solving the problem.

The components are ordered in terms of importance! Getting the right problem to solve is vastly more important that writing the best code.

However, almost all of the papers and talks and blogs concentrate on components 3.2. The least important step actually gets the most attention.

That's the problem. I don't know of a good way of getting better at the higher-value stages except to try to always understand what you are doing. For instance, radically changing component 3.1 may mean you are really changing the business problem being addressed; just make sure that is what you want to do.

No comments: