<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Machine Learning Articles on Lukas Hofbauer</title>
    <link>https://hofbauer.tech/ml-blog/</link>
    <description>Recent content in Machine Learning Articles on Lukas Hofbauer</description>
    <generator>Hugo</generator>
    <language>en</language>
    <lastBuildDate>Thu, 11 Sep 2025 18:04:53 +0200</lastBuildDate>
    <atom:link href="https://hofbauer.tech/ml-blog/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Adam Optimizer</title>
      <link>https://hofbauer.tech/ml-blog/adam/</link>
      <pubDate>Thu, 11 Sep 2025 18:04:53 +0200</pubDate>
      <guid>https://hofbauer.tech/ml-blog/adam/</guid>
      <description>&lt;p&gt;When training deep neural networks, choosing the right optimizer can make the difference between fast, stable convergence and hours of frustration. One of the most widely used algorithms is &lt;strong&gt;Adam (Adaptive Moment Estimation)&lt;/strong&gt;, introduced by &lt;a href=&#34;https://arxiv.org/abs/1412.6980&#34;&gt;Diederik P. Kingma and Jimmy Ba in 2014&lt;/a&gt;. Adam has become the default optimizer in many frameworks (PyTorch, TensorFlow, JAX) and is still at the heart of cutting-edge models like transformers.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;the-idea-behind-adam&#34;&gt;The Idea Behind Adam&lt;/h2&gt;
&lt;p&gt;Adam combines two key ideas from earlier optimizers:&lt;/p&gt;</description>
    </item>
    <item>
      <title>LoRA from Scratch</title>
      <link>https://hofbauer.tech/ml-blog/lora/</link>
      <pubDate>Sat, 23 Aug 2025 13:03:08 +0700</pubDate>
      <guid>https://hofbauer.tech/ml-blog/lora/</guid>
      <description>&lt;h1 id=&#34;lora-low-rank-adaptation&#34;&gt;LoRA (Low-Rank Adaptation)&lt;/h1&gt;
&lt;p&gt;LoRA, short for &lt;strong&gt;Low-Rank Adaptation&lt;/strong&gt;, is one of the most popular &lt;em&gt;parameter-efficient fine-tuning&lt;/em&gt; (PEFT) methods. It was first proposed by &lt;strong&gt;&lt;a href=&#34;https://arxiv.org/pdf/2106.09685&#34;&gt;Hu et al., 2021&lt;/a&gt;&lt;/strong&gt;, and has become a go-to technique when adapting large pretrained models to new tasks.&lt;/p&gt;
&lt;p&gt;Why do we need PEFT methods in the first place?
Finetuning large language models in the traditional way—updating all of their billions of parameters—is simply too expensive in terms of compute, memory, and storage. Researchers realized that we don’t actually need to change every parameter of a pretrained model to make it useful for new tasks.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Neural Network</title>
      <link>https://hofbauer.tech/ml-blog/neural-network/</link>
      <pubDate>Sun, 27 Jul 2025 18:17:11 +0700</pubDate>
      <guid>https://hofbauer.tech/ml-blog/neural-network/</guid>
      <description>&lt;h1 id=&#34;build-a-neural-network-from-scratch&#34;&gt;Build a Neural Network from Scratch&lt;/h1&gt;
&lt;p&gt;In this post, we&amp;rsquo;ll walk through how to build a simple neural network from scratch using just NumPy. No high-level libraries like TensorFlow or PyTorch, just the fundamentals.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;what-is-a-neural-network&#34;&gt;What &lt;em&gt;is&lt;/em&gt; a Neural Network?&lt;/h2&gt;
&lt;p&gt;A neural network is a set of interconnected layers of simple computational units called &lt;strong&gt;neurons&lt;/strong&gt;. Each neuron receives inputs and returns an output value.
It does this by multiplying each input by a learned weight and adding adds a bias term. The resulting value is then passed through a  nonlinear activation function. We&amp;rsquo;ll explain later why this last step is necessary.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Precision Recall and other Classification Metrics</title>
      <link>https://hofbauer.tech/ml-blog/precision-recall/</link>
      <pubDate>Fri, 06 Jun 2025 17:36:17 +0900</pubDate>
      <guid>https://hofbauer.tech/ml-blog/precision-recall/</guid>
      <description>&lt;p&gt;When evaluating a classification model accuracy alone isn’t enough. To better understand how well your model is performing, we need to dig deeper by understanding metrics like &lt;strong&gt;precision&lt;/strong&gt;, &lt;strong&gt;recall&lt;/strong&gt;, &lt;strong&gt;F1 score&lt;/strong&gt;, and performance curves like &lt;strong&gt;ROC&lt;/strong&gt; and &lt;strong&gt;Precision-Recall (PR)&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ll start by using the same classifier as in the &lt;a href=&#34;https://hofbauer.tech/ml-blog/logistic-regression/&#34;&gt;Logistic
Regression&lt;/a&gt; post.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;sklearn&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;datasets&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;sklearn.linear_model&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;LogisticRegression&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;numpy&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;as&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;np&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;as&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;plt&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;iris&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;datasets&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;load_iris&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;X&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;iris&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;data&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;][:,&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;1&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;reshape&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;-&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;1&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;1&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;y&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;iris&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;target&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;==&lt;/span&gt; &lt;span class=&#34;mi&#34;&gt;0&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;astype&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;nb&#34;&gt;int&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;log_reg&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;LogisticRegression&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;log_reg&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;fit&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;X&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;y&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;img alt=&#34;png&#34; loading=&#34;lazy&#34; src=&#34;https://hofbauer.tech/ml-blog/precision-recall/output_1_0.png&#34;&gt;&lt;/p&gt;
&lt;h2 id=&#34;the-confusion-matrix&#34;&gt;The Confusion Matrix&lt;/h2&gt;
&lt;p&gt;Everything starts with the &lt;strong&gt;confusion matrix&lt;/strong&gt;, which keeps track of four outcomes in binary classification:&lt;/p&gt;</description>
    </item>
    <item>
      <title>Softmax</title>
      <link>https://hofbauer.tech/ml-blog/softmax/</link>
      <pubDate>Wed, 04 Jun 2025 14:35:57 +0900</pubDate>
      <guid>https://hofbauer.tech/ml-blog/softmax/</guid>
      <description>&lt;p&gt;Yesterday, we explored how to train a binary classifier using &lt;a href=&#34;https://hofbauer.tech/ml-blog/logistic-regression/&#34;&gt;&lt;strong&gt;logistic regression&lt;/strong&gt;&lt;/a&gt;. Today, we’ll generalize that idea to handle &lt;strong&gt;multiple classes&lt;/strong&gt;. This generalization is known as &lt;strong&gt;softmax regression&lt;/strong&gt;, or &lt;strong&gt;multinomial logistic regression&lt;/strong&gt;.&lt;/p&gt;
&lt;h3 id=&#34;the-idea&#34;&gt;The Idea&lt;/h3&gt;
&lt;p&gt;In binary logistic regression, we used a single score function to compute the probability of a class. For multiclass classification, we extend this by defining one &lt;strong&gt;score function&lt;/strong&gt; per class:&lt;/p&gt;
&lt;p&gt;$$
s_k(x) = \theta_k^T x
$$&lt;/p&gt;
&lt;p&gt;Here, $s_k(x)$ is the score for class $k$, and $\theta_k$ is the parameter vector for that class.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Logistic Regression</title>
      <link>https://hofbauer.tech/ml-blog/logistic-regression/</link>
      <pubDate>Tue, 03 Jun 2025 19:16:22 +0900</pubDate>
      <guid>https://hofbauer.tech/ml-blog/logistic-regression/</guid>
      <description>&lt;p&gt;Regression methods aren’t just for predicting continuous values—they can also be used for classification. The simplest example of this is &lt;strong&gt;Logistic Regression&lt;/strong&gt;, where we train a linear model to separate two classes in feature space. The goal is for the model to output &lt;code&gt;1&lt;/code&gt; if an input belongs to our target class and &lt;code&gt;0&lt;/code&gt; otherwise.&lt;/p&gt;
&lt;p&gt;The formula for logistic regression should look familiar if you’ve seen the post on &lt;a href=&#34;https://hofbauer.tech/ml-blog/linear-regression/&#34;&gt;Linear Regression&lt;/a&gt;:&lt;/p&gt;</description>
    </item>
    <item>
      <title>Regularized Linear Models</title>
      <link>https://hofbauer.tech/ml-blog/regularized-linear-models/</link>
      <pubDate>Tue, 03 Jun 2025 01:57:00 +0900</pubDate>
      <guid>https://hofbauer.tech/ml-blog/regularized-linear-models/</guid>
      <description>&lt;p&gt;Last time we saw how &lt;a href=&#34;https://hofbauer.tech/ml-blog/polynomial-regression/&#34;&gt;Polynomial regression&lt;/a&gt; can fit complex patterns, but as we increased the degree of the polynomial, we encountered &lt;strong&gt;overfitting&lt;/strong&gt;; the model performs well on the training data but poorly on unseen test data.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Regularization&lt;/strong&gt; helps combat overfitting by adding a penalty term to the loss function, discouraging overly complex models.
We&amp;rsquo;ll explore three common regularization techniques: &lt;strong&gt;Ridge&lt;/strong&gt;, &lt;strong&gt;Lasso&lt;/strong&gt;, and &lt;strong&gt;Elastic Net&lt;/strong&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h3 id=&#34;ridge-regression&#34;&gt;Ridge Regression&lt;/h3&gt;
&lt;p&gt;Ridge adds a penalty proportional to the &lt;strong&gt;squared magnitude&lt;/strong&gt; of coefficients:&lt;/p&gt;</description>
    </item>
    <item>
      <title>Polynomial Regression</title>
      <link>https://hofbauer.tech/ml-blog/polynomial-regression/</link>
      <pubDate>Sun, 01 Jun 2025 20:49:07 +0900</pubDate>
      <guid>https://hofbauer.tech/ml-blog/polynomial-regression/</guid>
      <description>&lt;p&gt;In the &lt;a href=&#34;https://hofbauer.tech/ml-blog/linear-regression/&#34;&gt;&lt;strong&gt;Linear Regression&lt;/strong&gt;&lt;/a&gt; notebook, we saw how to model relationships where the target variable depends linearly on the input features. But what if the relationship is &lt;strong&gt;non-linear&lt;/strong&gt;? Does that mean we need an entirely different type of model?&lt;/p&gt;
&lt;p&gt;Surprisingly, no. We can still use linear regression to model non-linear relationships, by transforming the input features.&lt;/p&gt;
&lt;p&gt;Imagine you&amp;rsquo;re trying to predict the price of a house based on the size of its plot. If the plot is rectangular and your dataset includes only the &lt;strong&gt;length&lt;/strong&gt; and &lt;strong&gt;width&lt;/strong&gt;, there&amp;rsquo;s no single feature that directly tells you the area. But since the &lt;strong&gt;area = length $\cdot$ width&lt;/strong&gt;, we could manually create a new feature called &lt;code&gt;area&lt;/code&gt;.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Gradient Descent</title>
      <link>https://hofbauer.tech/ml-blog/gradient-descent/</link>
      <pubDate>Sat, 31 May 2025 20:56:40 +0900</pubDate>
      <guid>https://hofbauer.tech/ml-blog/gradient-descent/</guid>
      <description>&lt;p&gt;Gradient descent is a general-purpose optimization algorithm that lies at the heart of many machine learning applications. The idea is to iteratively adjust a set of parameters, $\theta$, to minimize a given cost function.&lt;/p&gt;
&lt;p&gt;Like a ball rolling downhill, gradient descent uses the local gradient of the cost function with respect to $\theta$ to guide its steps in the direction of steepest descent.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&#34;png&#34; loading=&#34;lazy&#34; src=&#34;https://hofbauer.tech/ml-blog/gradient-descent/output_1_0.png&#34;&gt;&lt;/p&gt;
&lt;h2 id=&#34;the-role-of-the-learning-rate&#34;&gt;The Role of the Learning Rate&lt;/h2&gt;
&lt;p&gt;The most critical hyperparameter in gradient descent is the &lt;strong&gt;learning rate&lt;/strong&gt;. Choosing the right learning rate is crucial to ensure the algorithm converges efficiently.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Linear Regression</title>
      <link>https://hofbauer.tech/ml-blog/linear-regression/</link>
      <pubDate>Thu, 22 May 2025 12:05:03 +0900</pubDate>
      <guid>https://hofbauer.tech/ml-blog/linear-regression/</guid>
      <description>&lt;p&gt;Linear regression is a fundamental supervised learning algorithm used to model the relationship between a dependent variable $y$ and one or more independent variables $x$. In its simplest form (univariate linear regression), it assumes that the relationship between $x$ and $y$ is linear and can be described by the equation:&lt;/p&gt;
&lt;p&gt;$$ \hat y = k \cdot x + d $$&lt;/p&gt;
&lt;p&gt;But we can have arbitrarly many input features, as long as they are a linar combination in the form:
$$ \hat y = w_0 + w_1 x_1 + w_2 x_2 … w_n x_n$$&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
