Welcome to the "LLM Chronicles". In this introductory video, I'll lay out the roadmap for navigating the world of Large Language Models. Throughout this series, I'll focus on the big ideas and aim to provide an intuitive grasp of key concepts, from the basics of neural networks to how Recurrent Neural Networks and Transformers function. My goal is to help you understand how things work without the fluff.
Join me as I set the course for a clear and intuitive exploration of LLMs.
In this episode of the LLM Chronicles, we'll cover the basics of artificial neurons, see how they form multi-layer perceptrons, and learn how to model inputs and outputs for effective network processing. We'll also touch on vectorialization and its role in optimizing neural network implementations on GPUs.
Welcome to the first hands-on lab session of the LLM Chronicles. Here we first implement Multi-Layer Perceptrons (MLPs) from scratch using matrix operations with NumPy. Following that, we set up an MLP using PyTorch for digit classification on the MNIST dataset.
In this episode of the LLM Chronicles, we'll cover how to prepare datasets for effective training of neural networks. We'll then look at how the problem of training a network can be defined in terms of minimizing the loss function and we'll close by looking at how we can achieve this using gradient descent.
In this hands-on lab we look at how PyTorch uses the Autograd library to implement the concepts of derivatives and gradient descent that we saw in the previous episode.
This is an exciting episode of the LLM Chronicles as we finally cover how neural networks are trained in practice and what a training loop looks like. We also cover a training method called mini-batch updates, comparing it with full-batch and incremental updates. We end this episode by looking at common optimizations of gradient descent, such as momentum, RMSProp and ADAM.
In this lab we'll finally put everything together and using PyTorch we'll train our model to recognize the MNIST hand-written digits, using gradient descent and mini-batch updates.
In this episode we look at common metrics to evaluate our models, such as accuracy, precision and recall. We also look at overfitting and underfitting, two common issues that can arise when training models, and what we can do to mitigate them.
In this episode we'll look at how to model sequences of data, such as natural language, using Recurrent Neural Networks. We'll peak into the implementation of an RNN layer, looking at all the operations involved in the forward pass. Finally, we'll look at how good they are at remembering information from the past and at common issues that arise during training (vanishing and exploding gradients). We'll finish by looking at how Long short-term memory cells can help mitigate some of these issues.
In this short lab we'll first implement an RNN layer by hand to understand all the computations in the forward pass. after that, we'll look at PyTorch's implementation of RNNs and LSTM cells.
In this episode we'll look at how to model sequences of data, such as natural language, using Recurrent Neural Networks. We'll peak into the implementation of an RNN layer, looking at all the operations involved in the forward pass. Finally, we'll look at how good they are at remembering information from the past and at common issues that arise during training (vanishing and exploding gradients). We'll finish by looking at how Long short-term memory cells can help mitigate some of these issues.
In this lab we build a word-level language model using an RNN in PyTorch.
In this episode we explore the seminal work on Language Translation with Recurrent Neural Networks(RNNs) that sparked key innovations that power advanced language models like OpenAI's ChatGPT and Google's Gemini. Specifically, we'll look at how the encoder/decoder architecture can be used to solve the problem of alignment for seq2seq tasks such as language translation, as described by Ilya Sutskever at al. in a 2014 paper titled “Sequence to Sequence Learning with Neural Networks”.
In this lab we'll use PyTorch to build a encoder/decoder RNN for language translation, similar to the model described in the 2014 paper "Sequence to Sequence Learning with Neural Networks” by OpenAI's chief scientist Ilya Sutskever. Our RNN layers will use LSTM (Long Short-Term Memory) cells.
In this episode we'll cover the so-called "attention mechanism" for seq2seq tasks with Neural Networks. This was first introduced by Bahdanau at al. in 2014 to improve the performance of encoder/decoder RNNs used for language translation and proved fundamental to create the Transformer architecture that powers modern LLMs.
In this lab we'll add a simple attention mechanism to our language translation RNN. This will be similar to the method described in Bahdanau et al.'s seminal paper from 2015 titled "Neural Machine Translation by Jointly Learning to Align and Translate".
This episode covers the groundbreaking Transformer architecture introduced by Google researchers in 2017. We cover the introduction of self-attention and positional encoding to remove recurrence and increase parallelization, which led to greater performance and scalability.
This episode dives into how to build LLMs from the encoder component of Transformers. Specifically, we look at the similarities and differences between the encoder and decoder parts of a Transformer, we then see how Google used the encoder to build BERT via pre-training and fine-tuning.
In this lab we'll see how to fine-tune DistilBERT for analyzing the sentiment of restaurant reviews. We'll look at how to do this from scratch, adding the specific layers for classification by hand. We'll conclude by looking at how to use the HuggingFace Transformers library for this.
In this episode well see how OpenAPI built the models behind ChatGPT from the decoder part of a Transformer. We'll see how Generative Pre-Trained Transformers are fined tuned for instruction following in natural language and how hey are aligned using RLHF (Reinforcement Learning from Human Feedback).
In this lab we'll get hands-on with two open-weight LLMs, Gemma 2B and Llama-2 7B. We'll see how to run them inside Google Colab notebooks within the free T4 tier. We'll take a look at the base and the instruction tuned versions, to understand the difference and the impact of fine-tuning and RLHF on the performance of the models.
In this episode we'll cover the main limitations and challenges with current Large Language Models (LLM). This helps cut through the hype and get a clear picture of what the technology is good for and what needs improvement.
This episode introduces RAG (Retrieval Augmented Generation) with LLMs. We cover all the basic concepts, including semantic search with vector databases. We finish with a quick lab showing how to create a basic RAG system for the Huberman Lab Podcast with Langchain, GPT-3.5, BGE embeddings and ChromaDB.
This episode explores some issues with RAG, covering solutions and common strategies to mitigate them and improve performance such as self-query, parent document retrieval and HyDE (Hypothetical Document Embedding).
In this episode we look at the architecture of a multi-modal LLM, discussing the general training process of an MLLM. After that, we'll focus on vision and explore Vision Transformers and how they are trained with contrastive learning. Vision Transformers are the most commonly used building block in MLLMs with vision capabilities. Finally, we'll get hands-on and look into Google's open-weight PaliGemma, analysing its implementation to see these concepts in action within a real-world multi-modal LLM.
In this lab we look at how to use the OpenAI CLIP model for zero-shot image classification and image similarity. We will explore loading CLIP from HuggingFace and also deploying it on Azure AI Studio.
In this episode we'll cover LLM agents, focusing on the core research (that helped to improve LLMs' reasoning while allowing them to interaction with the external world via the use of tools. This includes Chain of Thought prompting, PAL (Program-aieded Language Models) and ReAct (Reason + Act) as used in Langchain and CrewAI agents.
In this video, I'll guide you through building LLM agents from scratch, starting with paradigms like PAL (Program-Aided Language models) and ReAct (Reason + Act) to enhance reasoning and agent behavior. First, you'll learn how to implement these manually. Then, I'll show you how to achieve the same using the Langchain framework, including LangChain Expression Language (LCEL). After that, I'll walk you through how to switch between different language models like GPT-4, Llama 3.1, and Mixtral on platforms like TogetherAI and Groq, giving you flexibility in agent design. Finally, I'll demonstrate how to use OpenAI's function calling as an alternative to a direct ReAct prompt, adding another layer of functionality to your agents.
This episode covers LLM hallucinations — why they happen, how to detect them, and ways to reduce them in RAG pipelines. We'll discuss key tools like RAGAS metrics for measuring faithfulness, context relevance, and answer relevance, along with techniques like using LLMs as judges and embedding models to catch hallucinations. Plus, we'll discuss the Lynx model, a fine-tuned version of Llama-3 built to identify and limit hallucinations, making responses more accurate.
In this lab we look at how to detecting the faithfulness or groundedness of LLM responses in a RAG pipeline to ensure they are rooted in the context provided. We see how to do this using an LLM-as-judge (gpt4o-mini, Lynx), and how frameworks like Llama-index and Ragas deal with this. We also look at the Groundedness API in Azure AI Content Safety and finish by comparing the different solutions on MiniHaluBench, a small dataset created from HaluBench.