Let's formalize the problem in dimension 2. To objective of the linear regression is to find the coefficient $\beta_0$, $\beta_1$ and $\beta_2$ in order to minimize $$f(\beta) = \sum_{i=1}^n (y_i - (\beta_1 x_{i,1} + \beta_2 x_{i,2} + \beta_0))^2$$.

Penalisation add constraint to this problem :

- Ridge : such that $||\beta_1||^2 + ||\beta_2||^2 < \mu$ which thanks to Lagrange multiplier reformulate the problem as $\sum_{i=1}^n (y_i - (\beta_1 x_{i,1} + \beta_2 x_{i,2} + \beta_0))^2 + \lambda (||\beta_1||^2 + ||\beta_2||^2)$
- LASSO : such that $|\beta_1| + |\beta_2| < \mu$ which thanks to Lagrange multiplier reformulate the problem as $\sum_{i=1}^n (y_i - (\beta_1 x_{i,1} + \beta_2 x_{i,2} + \beta_0))^2 + \lambda (|\beta_1| + |\beta_2|)$

If we expand the formula, we get :

$\sum_{i=1}^n (y_i - (\beta_1 x_{i,1} + \beta_2 x_{i,2} + \beta_0))^2$ | $=\sum_{i=1}^n y_i^2 + (\beta_1 x_{i,1} + \beta_2 x_{i,2} + \beta_0)^2 - 2(\beta_1 x_{i,1} + \beta_2 x_{i,2} + \beta_0) y_i$ |

$=\sum_{i=1}^n y_i^2 + \beta_1^2 x_{i,1}^2 + \beta_2^2 x_{i,2}^2 + \beta_0^2 + 2 \beta_1 \beta_2 x_{i,2} x_{i,1} + 2 \beta_0 \beta_1 x_{i,1} + 2 \beta_0 \beta_2 x_{i,2} - 2(\beta_1 x_{i,1} + \beta_2 x_{i,2} + \beta_0)y_i$ | |

$=(\sum_{i=1}^n x_{i,1}^2) \beta_1^2 + (2 \sum_{i=1}^n x_{i,2} x_{i,1}) \beta_1 \beta_2 + (\sum_{i=1}^n x_{i,2}^2) \beta_2^2 + (2 \sum_{i=1}^n (\beta_0 - y_i) x_{i,1}) \beta_1 + (2 \sum_{i=1}^n (\beta_0 - y_i) x_{i,2}) \beta_2 + (\sum_{i=1}^n \beta_0^2 - 2 \beta_0 y_i)$ | |

$=A \beta_1^2 + B \beta_1 \beta_2 + C \beta_2^2 + D \beta_1 + E \beta_2 + F$ |

$A=$ | $\sum_{i=1}^n x_{i,1}^2$ |

$B=$ | $2 \sum_{i=1}^n x_{i,2} x_{i,1}$ |

$C=$ | $\sum_{i=1}^n x_{i,2}^2$ |

$D=$ | $2 \sum_{i=1}^n (\beta_0 - y_i) x_{i,1}$ |

$E=$ | $2 \sum_{i=1}^n (\beta_0 - y_i) x_{i,2}$ |

$F=$ | $\sum_{i=1}^n \beta_0^2 - 2 \beta_0 y_i$ |

- $B^2-4AC <0$, the equation represents an ellipse
- $B^2-4AC =0$, the equation represents a parabola
- $B^2-4AC >0$, the equation represents a hyperbola

Although, we cannot always have $f(\beta) = 0$, we want to minimize $f$, so for a given value of $\beta$ we have $f(\beta) = m_{\beta}$. So the conic equation changes to $F_{\beta} = F-m_{\beta}$, so in the 3D space $(\beta_1,\beta_2,z)$, we can a parabola where each cut by the plan $z=constant$ is an ellipse.

In the reduce form of the equation, the radius of the ellipse are proportional to $-F_{\beta} = m_{\beta}-F$, thus if $m_{\beta}$ increase, the radius increase.

- $g_{Ridge}(\beta) = c_{\beta}^{Ridge}$ is a circle of radius $c_{\beta}^{Ridge}$
- $g_{LASSO}(\beta) = c_{\beta}^{LASSO}$ is a square of demi-diagonal $c_{\beta}^{LASSO}$

Let's say we have fix the size of the penalty (usually we test different value of the penalty through the choice of $\lambda$ and perform K-fold cross validation to select the best one). If the ellipse doesn't meet the penalty area, we have the increase the size of the ellipse up to cross the penalty. The minimisation of the sum of the two terms hence is achieved on the edge of the penalty area.

I let you imagine what happen in dimension greater than 2 ... ;)