Random Forest Classifier
Embedded method: Feature selection is integrated or built into classifier algorithm. During the training step, the classifier adjusts its internal parameters and determines the appropriate weights/importance given for each feature to produce the best classification accuracy.
Load Libraries
Read Data
Drop unwanted features and separate features and target
Split data into train and test dataset
Find shape of train and test datasets
Random Forest Classifier Script
Horizontal bar plot
Gradient Boosting Machines(GBMs)- Gradient Boosting Classifier:
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.ensemble import GradientBoostingClassifier
df= pd.read_csv(‘d:/custchurn.csv’)
df.head(3)
features = df.drop([‘churn’],axis=1)
target = df[‘churn’]
X = features
y = target
# Generate a synthetic dataset
X, y = make_classification(n_samples=1000, n_features=11, n_informative=9, n_redundant=2, random_state=42)
# Create a Gradient Boosting model
model = GradientBoostingClassifier(n_estimators=100, random_state=42)
# Fit the model to the data
model.fit(X, y)
# Get feature importances
feature_importances = model.feature_importances_
# Create a DataFrame to display feature importances
feature_names = [f’Feature {i}’ for i in range(X.shape[1])]
importance_df = pd.DataFrame({‘Feature’: feature_names, ‘Importance’: feature_importances})
# Sort features by importance
importance_df = importance_df.sort_values(by=’Importance’, ascending=False)
# Display the feature importances
print(importance_df)