Step 1: Importing the Modules
As always, the first step in building our model is to import the necessary packages and modules.
In Python we have the AdaBoostClassifier
and AdaBoostRegressor
classes from the scikit-learn library. For our case we would import AdaBoostClassifier
(since our example is a classification task). The train_test_split
method is used to split our dataset into training and test sets. We also import datasets
, from which we will use the the Iris Dataset.
from sklearn.ensemble import AdaBoostClassifier
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn import metrics
Step 2: Exploring the data
You can use any classification dataset, but here we'll use traditional Iris dataset for a multi-class classification problem. This dataset contains four features about different types of Iris flowers (sepal length, sepal width, petal length, petal width). The target is to predict the type of flower from three possibilities: Setosa, Versicolour, and Virginica. The dataset is available in the scikit-learn library, or you can also download it from the UCI Machine Learning Library.
Next, we make our data ready by loading it from the datasets package using the load_iris() method. We assign the data to the iris variable.
Further, we split our dataset into input variable X, which contains the features sepal length, sepal width, petal length, and petal width.
Y is our target variable, or the class that we have to predict: either Iris Setosa, Iris Versicolour, or Iris Virginica. Below is an example of what our data looks like.
iris = datasets.load_iris()
X = iris.data
y = iris.target
print(X)
print(Y)
Output:
[[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2]
[5.8 4. 1.2 0.2]
[5.7 4.4 1.5 0.4]
. . . .
. . . .
]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2]
Step 3: Splitting the data
Splitting the dataset into training and testing datasets is a good idea to see if our model is classifying the data points correctly on unseen data.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
Here we split our dataset into 70% training and 30% test which is a common scenario.
Step 4: Fitting the Model
Building the AdaBoost Model. AdaBoost takes Decision Tree as its learner model by default. We make an AdaBoostClassifier object and name it abc. Few important parameters of AdaBoost are :
abc = AdaBoostClassifier(n_estimators=50,
learning_rate=1)
We then go ahead and fit our object abc to our training dataset. We call it a model.
model = abc.fit(X_train, y_train)
Step 5: Making the Predictions
Our next step would be to see how good or bad our model is to predict our target values.
y_pred = model.predict(X_test)
In this step, we take a sample observation and make a prediction on unseen data. Further, we use the predict() method on the model to check for the class it belongs to.
Step 6: Evaluating the model
The Model accuracy will tell us how many times our model predicts the correct classes.
print("Accuracy:", metrics.accuracy_score(y_test, y_pred))
Output:
Accuracy:0.8666666666666667
You get an accuracy of 86.66% -not bad. You can experiment with various other base learners like Support Vector Machine, Logistic Regression which might give you higher accuracy.
Happy Pythoning
Comments