In statistics,

**linear regression** is an approach for modeling the relationship between a scalar

dependent variable *y* and one or more explanatory variables (or independent variables) denoted

*X*. The case of one explanatory variable (independent variable) is called

*simple linear regression*. For more than one explanatory variable (independent variable), the process is called

*multiple linear regression*. (This term is distinct from

*multivariate linear regression*, where multiple correlated dependent variables are predicted, rather than a single scalar variable.)

In linear regression, the relationships are modeled using linear predictor functions whose unknown model parameters are estimated from the data. Such models are called

*linear models*. Most commonly, the

conditional mean of

*y* given the value of

*X* is assumed to be an

affine function of

*X*; less commonly, the

median or some other quantile of the conditional distribution of

*y* given

*X* is expressed as a

linear function of

*X*. Like all forms of

regression analysis, linear regression focuses on the conditional

probability distribution of

*y* given

*X*, rather than on the joint probability distribution of

*y* and

*X*, which is the domain of

multivariate analysis.

Linear regression was the first type of regression analysis to be studied rigorously, and to be used extensively in practical applications. This is because models which depend linearly on their unknown parameters are easier to

fit than models which are non-linearly related to their parameters and because the statistical properties of the resulting estimators are easier to determine.

Linear regression has many practical uses. Most applications fall into one of the following two broad categories:

- If the goal is prediction, or forecasting, or error reduction, linear regression can be used to fit a predictive model to an observed data set of
*y* and *X* values. After developing such a model, if an additional value of *X* is then given without its accompanying value of *y*, the fitted model can be used to make a prediction of the value of *y*.
- Given a variable
*y* and a number of variables *X*_{1}, ..., *X*_{p} that may be related to *y*, linear regression analysis can be applied to quantify the strength of the relationship between *y* and the *X*_{j}, to assess which *X*_{j} may have no relationship with *y* at all, and to identify which subsets of the *X*_{j} contain redundant information about *y*.

Linear regression models are often fitted using the least squares approach, but they may also be fitted in other ways, such as by minimizing the "lack of fit" in some other norm (as with least absolute deviations regression), or by minimizing a penalized version of the least squares loss function as in ridge regression (

*L*^{2}-norm penalty) and lasso (

*L*^{1}-norm penalty). Conversely, the least squares approach can be used to fit models that are not linear models. Thus, although the terms "least squares" and "linear model" are closely linked, they are not synonymous.