# Linear Regression

Types of Linear Regression

Linear regression is widely used in biological, behavioral and social sciences to describe possible relationships between variables. It ranks as one of the most important tools used in these disciplines. A trend line represents a trend, the long-term movement in time series data after other components have been accounted for.

## Linear Regression using Python

It tells whether a particular data set say GDP, oil prices or stock prices have increased or decreased over the period of time. A trend line could simply be drawn by eye through a set of data points, but more properly their position and slope is calculated using statistical techniques like linear regression.

Trend lines typically are straight lines, although some variations use higher degree polynomials depending on the degree of curvature desired in the line. Trend lines are sometimes used in business analytics to show changes in data over time. This has the advantage of being simple. Trend lines are often used to argue that a particular action or event such as training, or an advertising campaign caused observed changes at a point in time. This is a simple technique, and does not require a control group, experimental design, or a sophisticated analysis technique.

However, it suffers from a lack of scientific validity in cases where other potential changes can affect the data. Early evidence relating tobacco smoking to mortality and morbidity came from observational studies employing regression analysis.

In order to reduce spurious correlations when analyzing observational data, researchers usually include several variables in their regression models in addition to the variable of primary interest. For example, in a regression model in which cigarette smoking is the independent variable of primary interest and the dependent variable is lifespan measured in years, researchers might include education and income as additional independent variables, to ensure that any observed effect of smoking on lifespan is not due to those other socio-economic factors. However, it is never possible to include all possible confounding variables in an empirical analysis.

For example, a hypothetical gene might increase mortality and also cause people to smoke more. For this reason, randomized controlled trials are often able to generate more compelling evidence of causal relationships than can be obtained using regression analyses of observational data. When controlled experiments are not feasible, variants of regression analysis such as instrumental variables regression may be used to attempt to estimate causal relationships from observational data. The capital asset pricing model uses linear regression as well as the concept of beta for analyzing and quantifying the systematic risk of an investment.

This comes directly from the beta coefficient of the linear regression model that relates the return on the investment to the return on all risky assets.

## What is Linear Regression?

Linear regression is the predominant empirical tool in economics. For example, it is used to predict consumption spending ,  fixed investment spending, inventory investment , purchases of a country's exports ,  spending on imports ,  the demand to hold liquid assets ,  labor demand ,  and labor supply. Linear regression finds application in a wide range of environmental science applications.

In Canada, the Environmental Effects Monitoring Program uses statistical analyses on fish and benthic surveys to measure the effects of pulp mill or metal mine effluent on the aquatic ecosystem. Linear regression plays an important role in the field of artificial intelligence such as machine learning. The linear regression algorithm is one of the fundamental supervised machine-learning algorithms due to its relative simplicity and well-known properties. Least squares linear regression, as a means of finding a good rough linear fit to a set of points was performed by Legendre and Gauss for the prediction of planetary movement.

### Isn’t Linear Regression from Statistics?

Therefore, we will only go through it briefly. For example, in a regression model in which cigarette smoking is the independent variable of primary interest and the dependent variable is lifespan measured in years, researchers might include education and income as additional independent variables, to ensure that any observed effect of smoking on lifespan is not due to those other socio-economic factors. One can get overwhelmed by the number of articles in the web about machine learning algorithms. One measure of goodness of fit is the coefficient of determination , or R 2 pronounced r-square. Linear regression models the relation between a dependent, or response, variable y and one or more independent, or predictor, variables x 1 ,.

Main article: Trend estimation. Main article: Econometrics. This section needs expansion. You can help by adding to it. January Statistics portal. Analysis of variance Blinder—Oaxaca decomposition Censored regression model Cross-sectional regression Curve fitting Empirical Bayes methods Errors and residuals Lack-of-fit sum of squares Line fitting Linear classifier Linear equation Logistic regression M-estimator Multivariate adaptive regression splines Nonlinear regression Nonparametric regression Normal equations Projection pursuit regression Segmented linear regression Stepwise regression Structural break Support vector machine Truncated regression model.

Freedman Statistical Models: Theory and Practice. Cambridge University Press. A simple regression equation has on the right hand side an intercept and an explanatory variable with a slope coefficient. Seal The earliest form of the linear regression was the least squares method, which was published by Legendre in , and by Gauss in Legendre and Gauss both applied the method to the problem of determining, from astronomical observations, the orbits of bodies about the sun.

The Annals of Statistics. Criminal Justice Review. Gifted Child Quarterly. Journal of the American Statistical Association. The American Statistician. Craig International Statistical Review.

## Linear Regression for Machine Learning

Understanding Consumption. Oxford University Press. International Economics: Theory and Policy 9th global ed. Harlow: Pearson. New York: Harper Collins. Modern Labor Economics 10th international ed. London: Addison-Wesley. University of Pittsburgh. Cambridge: Harvard. Cohen, J. Here is where Linear Regression LR comes into play.

The essence of LR is to find the line that best fits the data points on the plot, so that we can, more or less, know exactly where the stock price is likely to fall in the year Obviously, this is an oversimplified example. However, the process stays the same. Linear Regression as an algorithm relies on the concept of lowering the cost to maximize the performance. We will examine this concept, and how we got the red line on the plot next.

To get the technicalities out of the way. What I described in the previous section is referred to as Univariate Linear Regression, because we are trying to map one independent variable x-value to one dependent variable y-value. This is in contrast to Multivariate Linear Regression, where we try to map multiple independent variables i. Any straight line on a plot follows the formula:.

islemaldives.com/components/newport/3179-office-mac.php In terms of Machine Learning, this follows the convention:. Where W0 and W1 are weights, X is the input feature, and h X is the label i. The way Linear Regression works is by trying to find the weights namely, W0 and W1 that lead to the best-fitting line for the input data i. X features we have. The best-fitting line is determined in terms of lowest cost.

Cost could take different forms, depending on the Machine Learning application at hand. However, in general, cost refers to the loss or error that the model yields in terms of how off it is from the actual Training data. Ti is the actual y-value at index i.

### Get your FREE Algorithms Mind Map

And finally, n is the total number of data points in the data set. All what our cost function is doing is basically getting the distance e. Euclidean distance between what y-value the model predicted and what the actual y-value resident in the data set is for every data point, then squaring this distance and dividing it by the number of data points we have to get the average cost.

Said distances are illustrated in the above figure as error vectors. Training a Machine Learning model is all about using a Learning Algorithm to find the weights W0, W1 in our formula that minimize the cost. Although it is a fairly simple topic, Gradient Descent deserves its own post. Therefore, we will only go through it briefly. In the context of Linear Regression, training is basically finding those weights and plugging them into the straight line function so that we have best-fit line with W0, W1 minimizing the cost.

The algorithm basically follows the pseudo-code:. The gist of this partial differentiation is basically the derivatives:.