Open on DataHub
# HIDDEN
# Clear previously defined variables
%reset -f

# Set directory for data loading to work properly
import os
os.chdir(os.path.expanduser('~/notebooks/20'))

pandas

Function Chapter Description
pd.DataFrame(data) Tabular Data and pandas Create a DataFrame from a two-dimensional array or dictionary data
pd.read_csv(filepath) Tabular Data and pandas Import a CSV file from filepath as a pandas DataFrame
pd.DataFrame.head(n=5)
pd.Series.head(n=5)
Tabular Data and pandas View the first n rows of a DataFrame or Series
pd.DataFrame.index
pd.DataFrame.columns
Tabular Data and pandas View a DataFrame's index and column values
pd.DataFrame.describe()
pd.Series.describe()
Exploratory Data Analysis View descriptive statistics about a DataFrame or Series
pd.Series.unique() Exploratory Data Analysis View unique values in a Series
pd.Series.value_counts() Exploratory Data Analysis View the number of times each unique value appears in a Series
df[col] Tabular Data and pandas From DataFrame df, return column col as a Series
df[[col]] Tabular Data and pandas From DataFrame df, return column col as a DataFrame
df.loc[row, col] Tabular Data and pandas From DataFrame df, return rows with index name row and column name col; row can alternatively be a boolean Series
df.iloc[row, col] Tabular Data and pandas From DataFrame df, return rows with index number row and column number col; row can alternatively be a boolean Series
pd.DataFrame.isnull()
pd.Series.isnull()
Data Cleaning View missing values in a DataFrame or Series
pd.DataFrame.fillna(value)
pd.Series.fillna(value)
Data Cleaning Fill in missing values in a DataFrame or Series with value
pd.DataFrame.dropna(axis)
pd.Series.dropna()
Data Cleaning Drop rows or columns with missing values from a DataFrame or Series
pd.DataFrame.drop(labels, axis) Data Cleaning Drop rows or columns named labels from DataFrame along axis
pd.DataFrame.rename() Data Cleaning Rename specified rows or column in DataFrame
pd.DataFrame.replace(to_replace, value) Data Cleaning Replace to_replace values with value in DataFrame
pd.DataFrame.reset_index(drop=False) Data Cleaning Reset a DataFrame's indices; by default, retains old indices as a new column unless drop=True specified
pd.DataFrame.sort_values(by, ascending=True) Tabular Data and pandas Sort a DataFrame by specified columns by, in ascending order by default
pd.DataFrame.groupby(by) Tabular Data and pandas Return a GroupBy object that contains a DataFrame grouped by the values in the specified columns by
GroupBy.<function> Tabular Data and pandas Apply a function <function> to each group in a GroupBy object GroupBy; e.g. mean(), count()
pd.Series.<function> Tabular Data and pandas Apply a function <function> to a Series with numerical values; e.g. mean(), max(), median()
pd.Series.str.<function> Tabular Data and pandas Apply a function <function> to a Series with string values; e.g. len(), lower(), split()
pd.Series.dt.<property> Tabular Data and pandas Extract a property <property> from a Series with Datetime values; e.g. year, month, date
pd.get_dummies(columns, drop_first=False) --- Convert categorical variables columns to dummy variables; default retains all variables unless drop_first=True specified
pd.merge(left, right, how, on) Exploratory Data Analysis; Databases and SQL Merge two DataFrames left and right together on specified columns on; type of join depends on how
pd.read_sql(sql, con) Databases and SQL Read a SQL query sql on a database connection con, and return result as a pandas DataFrame