CHI-SQUARE AUTOMATIC INTERACTION DETECTION(CHAID)

Popular decision tree algorithm technique
Chi-Square test of independence for splitting was used initially
Partitions data into mutually exclusive, exhaustive, subsets that describes the dependent categorical variables.
It is an interactive procedure that examines the predictors(Classification variables) and uses them in the order of their statistical significance

Splitting Rules

Based on the type of dependent variable(response/target)
For Continuous dependent variable F test is used
For Categorical dependent variable Chi-Square test of independence is used

CHAID PROCEDURE

Step 1: Examine each predictor(independent) variable for its statistical significance with dependent variable
Step 2: a) Use F-test if dependent variable is continuous.
b) Use Chi-Square Test if dependent variable is Categorical/qualitative
Step 3: Determine the most significant among the predictors
( predictors with smallest p-value after Bonferroni correction)
Step 4: Divide the data by levels of the most Significant predictor. Each of the groups will be examined individually further
Step 5: For each sub-group determine the most significant variable from the remaining predictor and divide the data again
Step 6: Repeat step 5 till stopping criteria is reached.

CHAID besides splitting it also supports merging

In merging least significantly different groups are merged to form one class. This is done to reduce the variable

CHAID STOPPING CRITERIA

Maximum tree depth is reached (depth is pre-defined)
Minimum number of cases to be a parent node is reached (this also pre-defined) 100
Minimum number of cases to be child node is reached.50

CHAID INPUT

Significance level for partitioning a variable
Significance level for merging variables
Minimum number of records for the cell

Using SPSS

CHAID – Chi-square Automatic Interaction Detection method uses Chi-square tests to identify optimal splits while building Decision tree. CHAID compares the cross tabulation between input fields and the outcome. (contingency table). It uses the chi-square independence test to determine if there is a significant association/relationship between two variables. If there is a significant relationship between the variables, select the particular feature/attribute which comes with smallest p-value. CHAID merges categories that show no difference in the outcome. Continues this process till all remaining categories differ at a specified testing level (0.05) 5%). SPSS automatically generates models that classify and predict outcomes with great accuracy. Decision trees provide clarity and transparency, making the model easier to explain to stakeholders.

CHAID is a predictive model. Can be used to forecast scenarios and draw conclusions. Can be used in a variety of fields like market segmentation, brand tracking and new product development

A statistically significant result from a chi-square test indicates that the two variables are not independent, and there is a relationship between them.

A Decision Tree is a classification model that works by recursively splitting data into subsets based on specific decision rules. Each internal node in the tree represents a “decision point,” and the branches indicate different outcomes based on the chosen decision rule. The terminal nodes (or leaves) represent the final classification or prediction outcomes.

MMK TECHNOLOGIES

B32, F1, C-COLONY, MUTHU SUNDARI APARTMENTS, WATER TANK SOUTH STREET, PERUMALPURAM

+91 9840922213

km301252@gmail.com

CHAID

CHI-SQUARE AUTOMATIC INTERACTION DETECTION(CHAID)

Splitting Rules

CHAID STOPPING CRITERIA

CHAID INPUT

Using SPSS