Improving LLM Confidence with Step-by-Step Reasoning

In a recent paper, “Step-wise Decomposition Improves Calibration for Answering Multi-Hop Questions”, we explore a subtle but important problem in large language models: they’re often way too confident. Even when the answer is wrong. This post walks through the core idea behind the paper, why calibration matters, and how a simple change to prompting (breaking reasoning into steps) can significantly improve how much we can trust a model’s stated confidence. ...

September 4, 2025 · 4 min · Lukas Hofbauer

Continual Learning in NLP: Tackling the Challenge of Catastrophic Forgetting

Introduction Machine learning models, particularly in Natural Language Processing (NLP), are becoming increasingly powerful. Yet, they suffer from a critical limitation: they forget. When trained on new tasks or domains, models often lose their ability to perform previously learned tasks—a phenomenon known as catastrophic forgetting. This problem becomes more pressing as NLP systems are expected to evolve alongside the ever-changing nature of human language. In my recent bachelor’s thesis, I explored how to mitigate catastrophic forgetting in NLP through continual learning. The goal? To enable lifelong learning models that can adapt to new information while retaining past knowledge. This post summarizes the key insights and contributions of my research, which formed the basis of my bachelor thesis. ...

March 1, 2025 · 3 min · Lukas Hofbauer