In the field of Artificial Intelligence, Machine learning integrates statistics, mathematics, and computer science to build algorithms. In Machine learning, the machine learns under Supervised and Un-Supervised methods. Algorithms enable you to predict population parameters based on the sample(training datasets). Besides the many benefits of ML, its predictive ability outperforms well. The primary function of supervised algorithm is allow machine to learn a model(formula) for the given data(training dataset) and to use them for unseen data(test dataset). This is the major aim of an algorithm for classification.
Decision Trees topic comes under Supervised Machine Learning. Enables you to create an easy- to- interpret model by splitting the given dataset variables/ features into nodes and branches in the form of a tree structure. The tree structure enables the researchers to make decision easily. Enables you to explore and understand real time complex datasets available in fields of marketing, healthcare and risk management.
The main purpose of machine learning is to predict the target/responses/dependent attribute based on the explanatory/independent attributes. The target attribute/feature is represented by y and the explanatory attributes/features are represented by X. The target attribute/feature could be either Categorical or numerical.

Categorical Target Variable/ Feature
If target variable of a dataset is categorical in nature, then we have to deal with classification problems. We have to predict the category of the customer where he belongs to.
That is if the target value is dichotomous (say yes or no) in the customer churning feature then we can predict if the customer is going to churn or not. Say Bata company may have a lot of customers. Among them who will churn or who will remain with the company while taking decision to buy a shoe from Bata. Some may belong to loyal customer and some may churn and go away from Bata at the time of purchasing shoes.
Continuous Target Variable/ Feature
Say the target variable is expressed in numerical value. Example Sales value. Then we have to use Regression for predicting the future sales.
Classification attempts to predict the goal class with the highest precision. The classification algorithm finds out the relationship between the input features and output feature to construct a model that is a training process. Here the data is properly classified and contains a minimum number of nodes. So, the decision tree method is optimal.
Decision Trees are used to solve classification problems
- Decision Tress are used to solve classification problems with Quantitative Variables
- Naive base algorithm to solve classification problems with Qualitative Variables (textual data/unstructured data)
Assumptions:
- Sufficient sample size: You need A large enough dataset for meaningful splits.
- Appropriate variables: Both categorical and continuous variables can be used, but the dependent variable should match the tree’s goal (classification or regression).
- No multicollinearity: Highly correlated predictors should be avoided, as they could distort the decision-making process
Hypothesis of the Decision Tree Analysis:
It is related to if certain input variables (predictors) effectively classify or predict the target variable. Example: if we use customer data, the hypothesis might test whether features like age and income level are significant predictors that may influence a customer to make a purchase.
The null hypothesis assumes no relationship between the predictors and the target variable, while the alternative hypothesis suggests that at least one of the predictors significantly affects the target outcome
Decision Tree:
- A decision tree is tree-based technique.
- Root node is found out first.
- Any path from root node is separated by a Boolean question
- Leaf nodes provides either yes or no and the result of the classification is achieved.
Decision tree Algorithm:

Structure of Decision Tree:

Explanations
Root node:
- The starting node from which the tree starts
- Only one root node
- No incoming edge or arrow to this node
- It has only child nodes. (zero or more child nodes)
- The root node does not have a parent node.
- dark green node in the above image
Internal nodes/nodes:
- All the in-between the root node and the leaf nodes are internal nodes or
- Simply called nodes.
- Internal nodes have both a parent and at least one child.
- Blue nodes in the above image
- Exactly one incoming edge
- Two or more outgoing edges
Leaf Node/leaf/Terminal node
- Nodes at the end of the tree,
- Do not have any children or called simply leaf.
- (red nodes in the above image)
- Exactly one incoming edge for this node
- No outgoing edge from this node
- Each leaf node is assigned a class label
- Each non-leaf node contains a test condition(Boolean) on one of the features
Parent node:
In any two connected nodes, the one which is in higher position is a parent node. B is parent to C, D,E and F is parent to G, E is parent to I and J
Child node:
In any two connected nodes, the one which is in lower position ( hierarchically), is a child node. C,D,E are children ( child nodes) to B, G is child to F and I and J are children to E
Splitting:
- Dividing a node into two or more child-nodes or
- adding two or more children to a node.
Decision node:
when a parent splits into two or more child- nodes then that node is called a decision node.
Pruning:
- When you remove the sub-node of a decision node, it is called pruning.
- it is the opposite process of splitting.
Branch/Sub-tree:
a subsection of the entire tree is called a branch or sub-tree.
Difference between Classification Tree and Regression Tree

Problems what we face
Say we have a dataset(tabular form) containing many numbers of Explanatory variables and one target variable. How to create a decision tree? The problem what we face here is
- Which feature has to be selected as root node?
- On what basis the node is to be split?
- Remember, the splitting strategy may affect the decision tree’s accuracy. Overfitting may occur.
- The purity of node should increase with respect to target variable after each split
Types of Decision Tree Algorithms
We have several Decision Tree Algorithms by which our given Data features are split

Characteristics of methods

Tree can be shown up to 200%. This is scaling of a tree

Importance of Independent Variables (feature importance)




