---
title: "LinearRegression"
language: "en"
type: "Method"
summary: "LinearRegression (Machine Learning Method) Method for Predict. Predict values using a linear combination of features. The linear regression predicts the numerical output y using a linear combination of numerical features x={x_ 1,x_ 2,\\[Ellipsis],x_n}. The conditional probability P(y|x) is modeled according to P(y|x)\\[Proportional]exp(-(y-f(\\[Theta],x))^2/(2 \\[Sigma]^2)), with f(\\[Theta],x)=x.\\[Theta]. The estimation of the parameter vector \\[Theta] is done by minimizing the loss function ( 1 ) / ( 2 ) UnderoverscriptBox[\\[Sum], RowBox[{i, =, 1}], m](y_i-f(\\[Theta],x_i))^2+\\[Lambda]_ 1UnderoverscriptBox[\\[Sum], RowBox[{i, =, 1}], n]TemplateBox[{SubscriptBox[\\[Theta], i]}, Abs]+ ( \\[Lambda]_ 2 ) / ( 2 ) UnderoverscriptBox[\\[Sum], RowBox[{i, =, 1}], n]\\[Theta]_i^2, where m is the number of examples and n is the number of numerical features. The following suboptions can be given: Possible settings for the OptimizationMethod option include: For this method, Information[PredictorFunction[\\[Ellipsis]],Function] gives a simple expression to compute the predicted value from the features."
keywords: 
- MachineLearning
- Regression method
- Normal distribution
- Prediction
- Linear approximation
canonical_url: "https://reference.wolfram.com/language/ref/method/LinearRegression.html"
source: "Wolfram Language Documentation"
---
# "LinearRegression" (Machine Learning Method)

* Method for ``Predict``.

* Predict values using a linear combination of features.

---

## Details & Suboptions

* The linear regression predicts the numerical output ``y`` using a linear combination of numerical features $x=\left\{x_1,x_2,\ldots ,x_n\right\}$. The conditional probability $P(y|x)$ is modeled according to $P(y|x)\propto \exp \left(-(\text{\textit{$y$}}-\text{\textit{$f$}}(\theta ,\text{\textit{$x$}}))^2/\left.(2 \sigma ^2\right)\right)$, with $f(\theta ,x)=x.\theta$.

* The estimation of the parameter vector ``θ`` is done by minimizing the loss function $\frac{1}{2}\sum _{i=1}^m \left(y_i-f\left(\theta ,x_i\right)\right){}^2+\lambda _1\sum _{i=1}^n \left| \theta _i\right| +\frac{\lambda _2}{2} \sum
_{i=1}^n \theta _i{}^2$, where ``m`` is the number of examples and ``n`` is the number of numerical features.

* The following suboptions can be given:

|                       |           |                                                                            |
| --------------------- | --------- | -------------------------------------------------------------------------- |
| "L1Regularization"    | 0         | value of  $\lambda _1$ in the loss function  |
| "L2Regularization"    | Automatic | value of  $\lambda _2$ iin the loss function |
| "OptimizationMethod"  | Automatic | what optimization method to use                                            |

* Possible settings for the ``"OptimizationMethod"`` option include:

|                             |                                  |
| --------------------------- | -------------------------------- |
| "NormalEquation"            | linear algebra method            |
| "StochasticGradientDescent" | stochastic gradient method       |
| "OrthantWiseQuasiNewton"    | orthant-wise quasi-Newton method |

* For this method, ``Information[PredictorFunction[…], "Function"]`` gives a simple expression to compute the predicted value from the features.

---

## Examples (7)

### Basic Examples (2)

Train a predictor on labeled examples:

```wl
In[1]:= p = Predict[{1, 2, 3, 4} -> {.3, .4, .6, 9}, Method -> "LinearRegression"]

Out[1]=
PredictorFunction[Association["ExampleNumber" -> 4, 
  "Input" -> Association["Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
      Association["Input" -> Association["f1" -> Association["Type" -> "Numerical"]], 
       "Output" -> As ...  -> DateObject[{2025, 6, 25, 1, 13, 
       34.3484699`9.288482292003287}, "Instant", "Gregorian", -5.], "ProcessorCount" -> 4, 
    "ProcessorType" -> "x86-64", "OperatingSystem" -> "Windows", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

Look at the ``Information`` :

```wl
In[2]:= Information[p]

Out[2]=
MachineLearning`MLInformationObject[PredictorFunction[Association["ExampleNumber" -> 4, 
   "Input" -> Association["Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
       Association["Input" -> Association["f1" -> Association["Type" -> ... DateObject[{2025, 6, 25, 1, 13, 
        34.3484699`9.288482292003287}, "Instant", "Gregorian", -5.], "ProcessorCount" -> 4, 
     "ProcessorType" -> "x86-64", "OperatingSystem" -> "Windows", "SystemWordLength" -> 64, 
     "Evaluations" -> {}]]]]
```

Predict a new example:

```wl
In[3]:= p[1.3]

Out[3]= 1.6797
```

---

Generate two-dimensional data:

```wl
In[1]:=
data = Table[x -> x + RandomVariate[NormalDistribution[]], {x, RandomReal[{-5, 5}, 40]}];
ListPlot[List@@@data]

Out[1]= [image]
```

Train a predictor function on it:

```wl
In[2]:= p = Predict[data, Method -> "LinearRegression"]

Out[2]= PredictorFunction[…]
```

Compare the data with the predicted values and look at the standard deviation:

```wl
In[3]:=
Show[Plot[{
	p[x], 
	p[x] + StandardDeviation[p[x, "Distribution"]], p[x] - StandardDeviation[p[x, "Distribution"]]}, 
	{x, -2, 6}, 
	PlotStyle -> {Blue, Gray, Gray}, 
	Filling -> {2 -> {3}}, 
	Exclusions -> False, 
	PerformanceGoal -> "Speed", PlotLegends -> {"Prediction", "Confidence Interval"}
	], 
	ListPlot[List@@@data, PlotStyle -> Red, PlotLegends -> {"Data"}]
	]

Out[3]= [image]
```

### Options (5)

#### "L1Regularization" (2)

Use the ``"L1Regularization"`` option to train a predictor:

```wl
In[1]:= p = Predict[{1, 2, 3, 4} -> {.3, .4, .6, 9}, Method -> {"LinearRegression", "L1Regularization" -> 1}]

Out[1]=
PredictorFunction[Association["ExampleNumber" -> 4, 
  "Input" -> Association["Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
      Association["Input" -> Association["f1" -> Association["Type" -> "Numerical"]], 
       "Output" -> As ...  -> DateObject[{2020, 7, 14, 8, 18, 
       48.5807346`9.439038935819923}, "Instant", "Gregorian", -5.], "ProcessorCount" -> 4, 
    "ProcessorType" -> "x86-64", "OperatingSystem" -> "Windows", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

---

Generate a training set and visualize it:

```wl
In[1]:=
trainingset = Flatten[Table[{x, y} -> x + RandomReal[1], {x, RandomReal[{-5, 5}, 20]}, {y, RandomReal[{-5, 5}, 20]}], 1];
ListPointPlot3D[Flatten /@ List@@@trainingset]

Out[1]= [image]
```

Train two predictors by using different values of the ``"L1Regularization"`` option:

```wl
In[2]:= p0 = Predict[trainingset, Method -> {"LinearRegression", "L1Regularization" -> 0}]

Out[2]=
PredictorFunction[Association["ExampleNumber" -> 400, 
  "Input" -> Association["Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
      Association["Input" -> Association["f1" -> Association["Type" -> "NumericalVector", 
           "Len ... "Date" -> DateObject[{2020, 7, 14, 8, 18, 58.0636423`9.516479106364804}, "Instant", 
      "Gregorian", -5.], "ProcessorCount" -> 4, "ProcessorType" -> "x86-64", 
    "OperatingSystem" -> "Windows", "SystemWordLength" -> 64, "Evaluations" -> {}]]]

In[3]:= p7 = Predict[trainingset, Method -> {"LinearRegression", "L1Regularization" -> 7}]

Out[3]=
PredictorFunction[Association["ExampleNumber" -> 400, 
  "Input" -> Association["Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
      Association["Input" -> Association["f1" -> Association["Type" -> "NumericalVector", 
           "Len ...  "Date" -> DateObject[{2020, 7, 14, 8, 19, 1.9244177`8.036874329142776}, "Instant", "Gregorian", 
      -5.], "ProcessorCount" -> 4, "ProcessorType" -> "x86-64", "OperatingSystem" -> "Windows", 
    "SystemWordLength" -> 64, "Evaluations" -> {}]]]
```

Look at the predictor function to see how the larger L1 regularization has forced one parameter to be zero:

```wl
In[4]:= Information[p0, "Function"]

Out[4]= 0.489516  + 1.00198 #1 + 0.00537362 #2&

In[5]:= Information[p7, "Function"]

Out[5]= 0.495406  + 0.985138 #1&
```

#### "L2Regularization" (2)

Use the ``"L2Regularization"`` option to train a predictor:

```wl
In[1]:= p = Predict[{1, 2, 3, 4} -> {.3, .4, .6, 9}, Method -> {"LinearRegression", "L2Regularization" -> 1}]

Out[1]=
PredictorFunction[Association["ExampleNumber" -> 4, 
  "Input" -> Association["Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
      Association["Input" -> Association["f1" -> Association["Type" -> "Numerical"]], 
       "Output" -> As ...  -> DateObject[{2020, 7, 14, 8, 19, 
       22.2951886`9.100786079649875}, "Instant", "Gregorian", -5.], "ProcessorCount" -> 4, 
    "ProcessorType" -> "x86-64", "OperatingSystem" -> "Windows", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

---

Generate a training set and visualize it:

```wl
In[1]:=
trainingset = Table[x -> x + RandomReal[{-3, 3}], {x, RandomReal[{-5, 5}, 20]}];
ListPlot[List@@@trainingset]

Out[1]= [image]
```

Train two predictors by using different values of the ``"L2Regularization"`` option:

```wl
In[2]:= p0 = Predict[trainingset, Method -> {"LinearRegression", "L2Regularization" -> 0}]

Out[2]=
PredictorFunction[Association["ExampleNumber" -> 20, 
  "Input" -> Association["Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
      Association["Input" -> Association["f1" -> Association["Type" -> "Numerical"]], 
       "Output" -> A ... "Date" -> DateObject[{2020, 7, 14, 8, 19, 33.7447211`9.280780742201365}, "Instant", 
      "Gregorian", -5.], "ProcessorCount" -> 4, "ProcessorType" -> "x86-64", 
    "OperatingSystem" -> "Windows", "SystemWordLength" -> 64, "Evaluations" -> {}]]]

In[3]:= p5 = Predict[trainingset, Method -> {"LinearRegression", "L2Regularization" -> 5}]

Out[3]=
PredictorFunction[Association["ExampleNumber" -> 20, 
  "Input" -> Association["Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
      Association["Input" -> Association["f1" -> Association["Type" -> "Numerical"]], 
       "Output" -> A ... "Date" -> DateObject[{2020, 7, 14, 8, 19, 35.8245553`9.306755701783109}, "Instant", 
      "Gregorian", -5.], "ProcessorCount" -> 4, "ProcessorType" -> "x86-64", 
    "OperatingSystem" -> "Windows", "SystemWordLength" -> 64, "Evaluations" -> {}]]]
```

Look at the predictor functions to see how the L2 regularization has reduced the norm of the parameter vector:

```wl
In[4]:= {f1, f2} = Information[#, "Function"]& /@ {p0, p5}

Out[4]= {-0.521866 + 1.0874 #1&, -0.583965 + 0.869922 #1&}

In[5]:=
{theta1, theta2} = Most[Cases[#, _ ? NumericQ, Infinity]]& /@ {f1, f2};
Norm /@ {theta1, theta2}

Out[5]= {1.20615, 1.04775}
```

#### "OptimizationMethod" (1)

Generate a large training set:

```wl
In[1]:=
n = 20000;
dim = 20;
trainingset = RandomReal[{-5, 5}, {n, dim}] -> RandomReal[1, n];
```

Train predictors with different optimization methods and compare their training times:

```wl
In[2]:= AbsoluteTiming[p1 = Predict[trainingset, Method -> {"LinearRegression", "OptimizationMethod" -> "NormalEquation"}];]

Out[2]= {3.55401, Null}

In[3]:= AbsoluteTiming[p2 = Predict[trainingset, Method -> {"LinearRegression", "OptimizationMethod" -> "OrthantWiseQuasiNewton"}];]

Out[3]= {24.2047, Null}
```

## See Also

* [`Predict`](https://reference.wolfram.com/language/ref/Predict.en.md)
* [`PredictorFunction`](https://reference.wolfram.com/language/ref/PredictorFunction.en.md)
* [`LinearModelFit`](https://reference.wolfram.com/language/ref/LinearModelFit.en.md)
* [`Fit`](https://reference.wolfram.com/language/ref/Fit.en.md)
* [`LeastSquares`](https://reference.wolfram.com/language/ref/LeastSquares.en.md)
* [`GeneralizedLinearModelFit`](https://reference.wolfram.com/language/ref/GeneralizedLinearModelFit.en.md)
* [`DecisionTree`](https://reference.wolfram.com/language/ref/method/DecisionTree.en.md)
* [`GaussianProcess`](https://reference.wolfram.com/language/ref/method/GaussianProcess.en.md)
* [`GradientBoostedTrees`](https://reference.wolfram.com/language/ref/method/GradientBoostedTrees.en.md)
* [`NearestNeighbors`](https://reference.wolfram.com/language/ref/method/NearestNeighbors.en.md)
* [`NeuralNetwork`](https://reference.wolfram.com/language/ref/method/NeuralNetwork.en.md)
* [`RandomForest`](https://reference.wolfram.com/language/ref/method/RandomForest.en.md)

## Related Links

* [An Elementary Introduction to the Wolfram Language: Machine Learning](https://www.wolfram.com/language/elementary-introduction/22-machine-learning.html)

## History

* [Introduced in 2014 (10.0)](https://reference.wolfram.com/language/guide/SummaryOfNewFeaturesIn100.en.md)