Load data from csv file to R by using read.csv function. Then convert it to data.frame. Then assign individual column to different variables. Now find regression using linear model (lm function) print summary of the regression. Then predict the value of y when x = 40

show the data in R

Assign y dependent variable to a

Assign X independent variable (feature) to b

 

simple linear Regression Equation:

y = 19.994 +3.414x

Summary:

Plot the simple linear regression line of y on x

Result: The value of y and when x =40

PROBLEM 2:

Regression line of Y on X

Using R

Regression line of X on Y

Regression Analysis

Regression analysis enables you to estimate the relationship between dependent and independent variables. How  dependent variable varies when one unit of value of independent variables change. It is used for prediction and forecasting. Regression Equation is arrived at by using coefficient Estimates for Intercept and slope. Under Least Square Method  the following things are to be

Residuals

Residuals are nothing but differences between observed values and calculated values

Summary

Part 1:

Calling the formula lm(formula= a~b)

Part 2:

Display the residuals

Min (Minimum residual)
1Q (25% residual (first quartile))
medin (50% second quartile)
3Q (75% residual)
Max (Maximum residual

Part 3:

 

Part 3: Coefficient-estimates for Intercept and slope

Standard Error  for Intercept and slope

After finding out b (slope) and (intercept) a we can find out the calculated value of y for each observed value of

Coefficient – t-statistics

Residual Standard Error

Multiple R-Squared:

 

Adjusted R-Squared

F-Statistics

P-Value

Based on the  t-statistic test statistic and the degrees of freedom, we determine the P-value. P-value is the smallest for which we can reject Null H0. P-value can be found out in three ways
1. p-value for a right-tail test
2. p-value for a left-tail test
3. p-value for a two-tail test

p-value for a right-tail test

p-value for a left-tail test

If the area under sampling distribution falls to Left of test-statistic of the test

p-value for a two-tails test

If the area under sampling distribution falls to Right of plus test-statistic of the test and falls to left of minus test-statistic of the test

p-value using excel

In excel we use formula TDIST(tvalue,d.f,tail). For p-value of intercept we have to use TDIST(3.13995759693781,61,2) and for slope TDIST(17.4650878615404,61,2). t-value must be absolute and not negative. If use negative you will get result as not a number

p-value using R

To find p-value associated with a t-score in R, we can use pt() function. follow the syntax as shown below

pt(q, df, lower.tail = TRUE)
where:
q: The t-score
df: The degrees of freedom
lower.tail: If TRUE, the probability to the left of q in the t distribution is returned.
If FALSE, the probability to the right is returned. Default is TRUE