Open on DataHub

# HIDDEN
# Clear previously defined variables
%reset -f

# Set directory for data loading to work properly
import os
os.chdir(os.path.expanduser('~/notebooks/20'))

pandas¶

Function	Chapter	Description
`pd.DataFrame(data)`	Tabular Data and pandas	Create a DataFrame from a two-dimensional array or dictionary `data`
`pd.read_csv(filepath)`	Tabular Data and pandas	Import a CSV file from `filepath` as a pandas DataFrame
`pd.DataFrame.head(n=5)` `pd.Series.head(n=5)`	Tabular Data and pandas	View the first `n` rows of a DataFrame or Series
`pd.DataFrame.index` `pd.DataFrame.columns`	Tabular Data and pandas	View a DataFrame's index and column values
`pd.DataFrame.describe()` `pd.Series.describe()`	Exploratory Data Analysis	View descriptive statistics about a DataFrame or Series
`pd.Series.unique()`	Exploratory Data Analysis	View unique values in a Series
`pd.Series.value_counts()`	Exploratory Data Analysis	View the number of times each unique value appears in a Series
`df[col]`	Tabular Data and pandas	From DataFrame `df`, return column `col` as a Series
`df[[col]]`	Tabular Data and pandas	From DataFrame `df`, return column `col` as a DataFrame
`df.loc[row, col]`	Tabular Data and pandas	From DataFrame `df`, return rows with index name `row` and column name `col`; `row` can alternatively be a boolean Series
`df.iloc[row, col]`	Tabular Data and pandas	From DataFrame `df`, return rows with index number `row` and column number `col`; `row` can alternatively be a boolean Series
`pd.DataFrame.isnull()` `pd.Series.isnull()`	Data Cleaning	View missing values in a DataFrame or Series
`pd.DataFrame.fillna(value)` `pd.Series.fillna(value)`	Data Cleaning	Fill in missing values in a DataFrame or Series with `value`
`pd.DataFrame.dropna(axis)` `pd.Series.dropna()`	Data Cleaning	Drop rows or columns with missing values from a DataFrame or Series
`pd.DataFrame.drop(labels, axis)`	Data Cleaning	Drop rows or columns named `labels` from DataFrame along `axis`
`pd.DataFrame.rename()`	Data Cleaning	Rename specified rows or column in DataFrame
`pd.DataFrame.replace(to_replace, value)`	Data Cleaning	Replace `to_replace` values with `value` in DataFrame
`pd.DataFrame.reset_index(drop=False)`	Data Cleaning	Reset a DataFrame's indices; by default, retains old indices as a new column unless `drop=True` specified
`pd.DataFrame.sort_values(by, ascending=True)`	Tabular Data and pandas	Sort a DataFrame by specified columns `by`, in ascending order by default
`pd.DataFrame.groupby(by)`	Tabular Data and pandas	Return a GroupBy object that contains a DataFrame grouped by the values in the specified columns `by`
`GroupBy.<function>`	Tabular Data and pandas	Apply a function `<function>` to each group in a GroupBy object `GroupBy`; e.g. `mean()`, `count()`
`pd.Series.<function>`	Tabular Data and pandas	Apply a function `<function>` to a Series with numerical values; e.g. `mean()`, `max()`, `median()`
`pd.Series.str.<function>`	Tabular Data and pandas	Apply a function `<function>` to a Series with string values; e.g. `len()`, `lower()`, `split()`
`pd.Series.dt.<property>`	Tabular Data and pandas	Extract a property `<property>` from a Series with Datetime values; e.g. `year`, `month`, `date`
`pd.get_dummies(columns, drop_first=False)`	---	Convert categorical variables `columns` to dummy variables; default retains all variables unless `drop_first=True` specified
`pd.merge(left, right, how, on)`	Exploratory Data Analysis; Databases and SQL	Merge two DataFrames `left` and `right` together on specified columns `on`; type of join depends on `how`
`pd.read_sql(sql, con)`	Databases and SQL	Read a SQL query `sql` on a database connection `con`, and return result as a pandas DataFrame