Machine Learning Challenges

Adam Optimizer

When training deep neural networks, choosing the right optimizer can make the difference between fast, stable convergence and hours of frustration. One of the most widely used algorithms is Adam (Adaptive Moment Estimation), introduced by Diederik P. Kingma and Jimmy Ba in 2014. Adam has become the default optimizer in many frameworks (PyTorch, TensorFlow, JAX) and is still at the heart of cutting-edge models like transformers. The Idea Behind Adam Adam combines two key ideas from earlier optimizers:...

LoRA from Scratch

LoRA (Low-Rank Adaptation) LoRA, short for Low-Rank Adaptation, is one of the most popular parameter-efficient fine-tuning (PEFT) methods. It was first proposed by Hu et al., 2021, and has become a go-to technique when adapting large pretrained models to new tasks. Why do we need PEFT methods in the first place? Finetuning large language models in the traditional way—updating all of their billions of parameters—is simply too expensive in terms of compute, memory, and storage....

Neural Network

Build a Neural Network from Scratch In this post, we’ll walk through how to build a simple neural network from scratch using just NumPy. No high-level libraries like TensorFlow or PyTorch, just the fundamentals. What is a Neural Network? A neural network is a set of interconnected layers of simple computational units called neurons. Each neuron receives inputs and returns an output value. It does this by multiplying each input by a learned weight and adding adds a bias term....

Precision Recall and other Classification Metrics

When evaluating a classification model accuracy alone isn’t enough. To better understand how well your model is performing, we need to dig deeper by understanding metrics like precision, recall, F1 score, and performance curves like ROC and Precision-Recall (PR). We’ll start by using the same classifier as in the Logistic Regression post. from sklearn import datasets from sklearn.linear_model import LogisticRegression import numpy as np import matplotlib.pyplot as plt iris = datasets....

Softmax

Yesterday, we explored how to train a binary classifier using logistic regression. Today, we’ll generalize that idea to handle multiple classes. This generalization is known as softmax regression, or multinomial logistic regression. The Idea In binary logistic regression, we used a single score function to compute the probability of a class. For multiclass classification, we extend this by defining one score function per class: $$ s_k(x) = \theta_k^T x $$...

Logistic Regression

Regression methods aren’t just for predicting continuous values—they can also be used for classification. The simplest example of this is Logistic Regression, where we train a linear model to separate two classes in feature space. The goal is for the model to output 1 if an input belongs to our target class and 0 otherwise. The formula for logistic regression should look familiar if you’ve seen the post on Linear Regression:...

Regularized Linear Models

Last time we saw how Polynomial regression can fit complex patterns, but as we increased the degree of the polynomial, we encountered overfitting; the model performs well on the training data but poorly on unseen test data. Regularization helps combat overfitting by adding a penalty term to the loss function, discouraging overly complex models. We’ll explore three common regularization techniques: Ridge, Lasso, and Elastic Net. Ridge Regression Ridge adds a penalty proportional to the squared magnitude of coefficients:...

Polynomial Regression

In the Linear Regression notebook, we saw how to model relationships where the target variable depends linearly on the input features. But what if the relationship is non-linear? Does that mean we need an entirely different type of model? Surprisingly, no. We can still use linear regression to model non-linear relationships, by transforming the input features. Imagine you’re trying to predict the price of a house based on the size of its plot....

Gradient Descent

Gradient descent is a general-purpose optimization algorithm that lies at the heart of many machine learning applications. The idea is to iteratively adjust a set of parameters, $\theta$, to minimize a given cost function. Like a ball rolling downhill, gradient descent uses the local gradient of the cost function with respect to $\theta$ to guide its steps in the direction of steepest descent. The Role of the Learning Rate The most critical hyperparameter in gradient descent is the learning rate....

Linear Regression

Linear regression is a fundamental supervised learning algorithm used to model the relationship between a dependent variable $y$ and one or more independent variables $x$. In its simplest form (univariate linear regression), it assumes that the relationship between $x$ and $y$ is linear and can be described by the equation: $$ \hat y = k \cdot x + d $$ But we can have arbitrarly many input features, as long as they are a linar combination in the form: $$ \hat y = w_0 + w_1 x_1 + w_2 x_2 … w_n x_n$$...