Open on DataHub
# HIDDEN
# Clear previously defined variables
%reset -f

# Set directory for data loading to work properly
import os
os.chdir(os.path.expanduser('~/notebooks/20'))

scikit-learn

Models and Model Selection

Import Function Section Description
sklearn.model_selection train_test_split(*arrays, test_size=0.2) Modeling and Estimation Returns two random subsets of each array passed in, with 0.8 of the array in the first subset and 0.2 in the second subset
sklearn.linear_model LinearRegression() Modeling and Estimation Returns an ordinary least squares Linear Regression model
sklearn.linear_model LassoCV() Modeling and Estimation Returns a Lasso (L1 Regularization) linear model with picking the best model by cross validation
sklearn.linear_model RidgeCV() Modeling and Estimation Returns a Ridge (L2 Regularization) linear model with picking the best model by cross validation
sklearn.linear_model ElasticNetCV() Modeling and Estimation Returns a ElasticNet (L1 and L2 Regularization) linear model with picking the best model by cross validation
sklearn.linear_model LogisticRegression() Modeling and Estimation Returns a Logistic Regression classifier
sklearn.linear_model LogisticRegressionCV() Modeling and Estimation Returns a Logistic Regression classifier with picking the best model by cross validation

Working with a Model

Assuming you have a model variable that is a scikit-learn object:

Function Section Description
model.fit(X, y) Modeling and Estimation Fits the model with the X and y passed in
model.predict(X) Modeling and Estimation Returns predictions on the X passed in according to the model
model.score(X, y) Modeling and Estimation Returns the accuracy of X predictions based on the corect values (y)