Open on DataHub
# HIDDEN
# Clear previously defined variables
%reset -f

# Set directory for data loading to work properly
import os
os.chdir(os.path.expanduser('~/notebooks/01'))

The Data Science Lifecycle

In data science, we use large and diverse data sets to make conclusions about the world. In this book we discuss principles and techniques of data science through the dual lens of computational and inferential thinking. Practically speaking, this involves the following process:

  1. Formulating a question or problem
  2. Acquiring and cleaning data
  3. Conducting exploratory data analysis
  4. Using prediction and inference to draw conclusions

It is quite common for more questions and problems to emerge after the last step of this process, and we can thus repeatedly engage in this procedure to discover new characteristics of our world. This positive feedback loop is so central to our work that we call it the data science lifecycle.

If the data science lifecycle were as easy to conduct as it is to state, there would be no need for textbooks about the the subject. Fortunately, each of the steps in the lifecycle contain numerous challenges that reveal powerful and often surprising insights that form the foundation of making thoughtful decisions using data.

As in Data 8, we will begin with an example.