Open on DataHub
# HIDDEN
# Clear previously defined variables
%reset -f

# Set directory for data loading to work properly
import os
os.chdir(os.path.expanduser('~/notebooks/03'))

Working with Tabular Data

Tabular data, like the datasets we have worked with in Data 8, are one of the most common and useful forms of data for analysis. We introduce tabular data manipulation using pandas, the standard Python library for working with tabular data. Although pandas's syntax is more challenging to use than the datascience package used in Data 8, pandas provides significant performance improvements and is the current tool of choice in both industry and academia for working with tabular data.

It is more important that you understand the types of useful operations on data than the exact details of pandas syntax. For example, knowing when to use a group or a join is more useful than knowing how to call the pandas function to group data. It is relatively easy to look up the function you need once you know the right operation to use. All of the table manipulations in this chapter will also appear again in a new syntax when we cover SQL, so it will help you to understand them now.

Because we will cover only the most important pandas functions in this textbook, you should bookmark the pandas documentation for reference when you conduct your own data analyses.