Using Python – MMKTECH CONSULTANTS

Steps Involved

Import data
See the data view and variable view
Provide codes to qualitative/categorical variables in variable view
Report Code book
Descriptive statistics
Analysis
Correlate
Regression
Classify
Graph

LIBRARY USED: STATSMODELS.API

Before importing library you must have installed packages in your system. After installing the latest python 3.13.2 downloaded from the website https://www.python.org/downloads, open command line interface in your windows system. In search box type cmd you will get cmd prompt. Run as an administrator. In CLI go to your directory where you have installed python and go to Scripts directory and pass the following command

D:\python313\Scripts > pip install pandas, numpy, statsmodels,jupyter notebook

Then open jupyter notebook

D:\python313>jupyter notebook

Jupyter notebook will open in microsoft edge/internet explorer with url : localhost:8888/tree as shown below

Now press new tab and select python 3(ipykernel)

Now jupyter notebook editor opens up

Load the libraries

Read Data

We use the same data Cellphone.csv. Containing one dependent variable ‘Price’. It is continuous in nature. We have 13 explanatory variables. We have to predict the price of a cellphone based on the equation which may contain all the 13 explanatory variables or reduced explanatory variables. The data is stored in the data frame object called df

Information about the data we handle

To know the variable name, types of the data, null data if any we pass the command df.info(). we use info() for this

Total variables used are 14 (1 dependent and 13 explanatory variables).No.of observations handled are 161 and the types could be int64 or float64 or categorical

Describe the data

Use object.describe() – describe function. Here you get the number of observations of each features, mean, std, minimum, maximum first quartile, median, third quartile values

If you want to see the above in a tabular form use object.describe().T

Find the shape of the data

PRE-PROCESSING THE DATA: Find if any null values exists

Use object.isnull().sum()

No null value exists. If null values/ missing values found the system will not proceed further. It will show error message

Separate dependent and independent variables from df

Dependent variable will be called as target variable

independent variables will be called as feature variables. use object.drop function to separe the target variable ‘price’ and assign it to X. use axis=1. Assign the df[‘Price’] to y

Find mean, variance, standard deviation of all features using program

Output

Convert random variable X and y as array with float64 type

Ordinary Least Square using Statsmodels.api

import statsmodels.api as sm

Find metrics using statsmodels.api

Once Again Read Data:

Standard Error of Betas using Python Program

Library Used: SKlearn without spliting the data

Box Plot

Bar Plot

Linear method (sales vs price)

sns.lmplot(data=df, x=’Sale’,y=’Price’)

Residual plot

Metrics

output

Using sklearn with spliting of given data

once again read data

Separate features and target

Split data into training and test data sets

Model Parameters(Coefficients and intercepts)

Metrics for split method

Lazy predict – Supervised Learning

Install Lazy predict using pip install Lazy predict

IDENTIFY WHICH MODEL IS BEST AND WHICH MODEL IS WORST FOR THE GIVEN DATASET

results_df1 = pd.DataFrame(models, columns=[‘Model’,’Adjusted R-squared’,’R-squared’,’RMSE’,’Time Taken’]

results_format_df1 =results_df1.style.hightlight_max(subset=[‘Adjusted R-squared’,’R-squared’],color=’green’).highlight_min(subset=[‘Adjusted R-squared’.’R-squared’],color=’red’)

display(results_format_df1)