This method evaluates the independence between categorical features and the target variable. It calculates the chi-square statistic to measure how expected counts compare to observed counts. Used for feature selection in classification problems with categorical features and target variables. Ex: In a survey dataset, the chi-square test can determine if there is a significant association between gender (feature) and preference for a product (target variable). Workflow: Step 1: Calculate the chi-square statistic and p-value for each feature.Step 2: Compare the p-values to a pre-defined significance level (e.g., 0.05).Step 3: Select features with p-values below the significance level
Load Libraries
Read data
Fill missing values using mode function
Data shape
Find if any missing values found in data
Convert Categorical Variables into numeric using Label Encoder function
Separate features and target using X and y
Calculate Chi-square,pvalue individually for each feature with target variable
Display higher Chi2 feature
P-value > 0.05 irrelevant features with low importance
Gender, Card Type and exp type do not contribute any significance and so they may be removed
Find the selected features based on Chi-Square method
Display selected features
Out of 10 features 8 features have been selected based on chi-square method. Now the data is ready for further analysis.
Define y target variable once again
Use Logistic Regression Model to predict y using statsmodels
summary
Prediction using sklearn
Beta Coeffecients and intercept
Confusion Matrix
Calculation of Accuracy, Precision, Recall and f-statistics
Accuracy
Precision:
Recall:
F1 score
Metrics
Classification Report
Receivers Operating Characteristics (ROC) and Area Under Curve (AUC)