Regularized Linear Models

Last time we saw how Polynomial regression can fit complex patterns, but as we increased the degree of the polynomial, we encountered overfitting; the model performs well on the training data but poorly on unseen test data. Regularization helps combat overfitting by adding a penalty term to the loss function, discouraging overly complex models. We’ll explore three common regularization techniques: Ridge, Lasso, and Elastic Net. Ridge Regression Ridge adds a penalty proportional to the squared magnitude of coefficients:...

June 3, 2025 · 3 min · Lukas Hofbauer

Polynomial Regression

In the Linear Regression notebook, we saw how to model relationships where the target variable depends linearly on the input features. But what if the relationship is non-linear? Does that mean we need an entirely different type of model? Surprisingly, no. We can still use linear regression to model non-linear relationships, by transforming the input features. Imagine you’re trying to predict the price of a house based on the size of its plot....

June 1, 2025 · 4 min · Lukas Hofbauer

Gradient Descent

Gradient descent is a general-purpose optimization algorithm that lies at the heart of many machine learning applications. The idea is to iteratively adjust a set of parameters, $\theta$, to minimize a given cost function. Like a ball rolling downhill, gradient descent uses the local gradient of the cost function with respect to $\theta$ to guide its steps in the direction of steepest descent. The Role of the Learning Rate The most critical hyperparameter in gradient descent is the learning rate....

May 31, 2025 · 4 min · Lukas Hofbauer
Linear Regression Results

Linear Regression

Linear regression is a fundamental supervised learning algorithm used to model the relationship between a dependent variable $y$ and one or more independent variables $x$. In its simplest form (univariate linear regression), it assumes that the relationship between $x$ and $y$ is linear and can be described by the equation: $$ \hat y = k \cdot x + d $$ But we can have arbitrarly many input features, as long as they are a linar combination in the form: $$ \hat y = w_0 + w_1 x_1 + w_2 x_2 … w_n x_n$$...

May 22, 2025 · 4 min · Lukas Hofbauer