# HIDDEN
# Clear previously defined variables
%reset -f
# Set directory for data loading to work properly
import os
os.chdir(os.path.expanduser('~/notebooks/20'))
pandas¶
Function | Chapter | Description |
---|---|---|
pd.DataFrame(data) |
Tabular Data and pandas | Create a DataFrame from a two-dimensional array or dictionary data |
pd.read_csv(filepath) |
Tabular Data and pandas | Import a CSV file from filepath as a pandas DataFrame |
pd.DataFrame.head(n=5) pd.Series.head(n=5) |
Tabular Data and pandas | View the first n rows of a DataFrame or Series |
pd.DataFrame.index pd.DataFrame.columns |
Tabular Data and pandas | View a DataFrame's index and column values |
pd.DataFrame.describe() pd.Series.describe() |
Exploratory Data Analysis | View descriptive statistics about a DataFrame or Series |
pd.Series.unique() |
Exploratory Data Analysis | View unique values in a Series |
pd.Series.value_counts() |
Exploratory Data Analysis | View the number of times each unique value appears in a Series |
df[col] |
Tabular Data and pandas | From DataFrame df , return column col as a Series |
df[[col]] |
Tabular Data and pandas | From DataFrame df , return column col as a DataFrame |
df.loc[row, col] |
Tabular Data and pandas | From DataFrame df , return rows with index name row and column name col ; row can alternatively be a boolean Series |
df.iloc[row, col] |
Tabular Data and pandas | From DataFrame df , return rows with index number row and column number col ; row can alternatively be a boolean Series |
pd.DataFrame.isnull() pd.Series.isnull() |
Data Cleaning | View missing values in a DataFrame or Series |
pd.DataFrame.fillna(value) pd.Series.fillna(value) |
Data Cleaning | Fill in missing values in a DataFrame or Series with value |
pd.DataFrame.dropna(axis) pd.Series.dropna() |
Data Cleaning | Drop rows or columns with missing values from a DataFrame or Series |
pd.DataFrame.drop(labels, axis) |
Data Cleaning | Drop rows or columns named labels from DataFrame along axis |
pd.DataFrame.rename() |
Data Cleaning | Rename specified rows or column in DataFrame |
pd.DataFrame.replace(to_replace, value) |
Data Cleaning | Replace to_replace values with value in DataFrame |
pd.DataFrame.reset_index(drop=False) |
Data Cleaning | Reset a DataFrame's indices; by default, retains old indices as a new column unless drop=True specified |
pd.DataFrame.sort_values(by, ascending=True) |
Tabular Data and pandas | Sort a DataFrame by specified columns by , in ascending order by default |
pd.DataFrame.groupby(by) |
Tabular Data and pandas | Return a GroupBy object that contains a DataFrame grouped by the values in the specified columns by |
GroupBy.<function> |
Tabular Data and pandas | Apply a function <function> to each group in a GroupBy object GroupBy ; e.g. mean() , count() |
pd.Series.<function> |
Tabular Data and pandas | Apply a function <function> to a Series with numerical values; e.g. mean() , max() , median() |
pd.Series.str.<function> |
Tabular Data and pandas | Apply a function <function> to a Series with string values; e.g. len() , lower() , split() |
pd.Series.dt.<property> |
Tabular Data and pandas | Extract a property <property> from a Series with Datetime values; e.g. year , month , date |
pd.get_dummies(columns, drop_first=False) |
--- | Convert categorical variables columns to dummy variables; default retains all variables unless drop_first=True specified |
pd.merge(left, right, how, on) |
Exploratory Data Analysis; Databases and SQL | Merge two DataFrames left and right together on specified columns on ; type of join depends on how |
pd.read_sql(sql, con) |
Databases and SQL | Read a SQL query sql on a database connection con , and return result as a pandas DataFrame |