LLM Fundamentals & Fine-Tuning Methods

Overview

How language models learn, and what fine-tuning does to that knowledge.

Entries

  • How LLMs Are Trained — Pretraining, tokenization, next-token prediction, and what a base model knows
  • Fine-Tuning Mechanics — What fine-tuning changes, instruction tuning vs. domain adaptation, catastrophic forgetting
  • LoRA and QLoRA — Low-rank adaptation for dense and MoE models: how they work, why MoE fine-tuning is harder, target module configs per model family, memory requirements
  • Base Models for Fine-Tuning — Survey of Llama, Mistral, Gemma, Phi, Qwen, and DeepSeek: who makes them, architecture, licensing, and suitability for specialist fine-tuning
  • RAG vs. Fine-Tuning — When to use retrieval-augmented generation vs. fine-tuning, how they compose, and the practical architecture for a network engineer specialist model
  • Config Generation vs. Troubleshooting — Why these are two different tasks with different training data, interaction patterns, and quality criteria — and the case for separate LoRA adapters
  • Model Internals: Weights, MoE, and Inference — What a model file actually contains, how transformer layers work beyond “a matrix of weights”, how MoE routing works, GGUF quantization formats, and a full trace of a query through the inference stack
  • Temperature and Sampling — What temperature actually is (logit scaling before softmax), how it reshapes probability distributions, top-p and top-k sampling, and why the right setting depends on the task
  • Evaluating a Fine-Tuned Model — Building a domain-specific eval set, exact-match and semantic scoring for CLI output, regression testing for general capability, and what “good enough” looks like before deployment
  • System Prompt Engineering — Writing the system prompt for a specialist model, the mandatory training-inference consistency requirement, few-shot examples in the system prompt, and testing before generating training data

Entries

  • Base Models for Fine-Tuning — A survey of the popular open-weight base models available for fine-tuning as of mid-2026: who makes them, their architectures, release history, licensing terms, and practical suitability for specialist domain fine-tuning on consumer hardware.
  • Config Generation vs. Troubleshooting: Two Different Models — Configuration generation and troubleshooting are fundamentally different tasks with different training data requirements, different interaction patterns, and different quality criteria — and may be better served by two separate fine-tuned models than one.
  • Evaluating a Fine-Tuned Network Expert Model — How to measure whether your fine-tuned network engineer specialist model is actually better than the base model — building an eval set, scoring command accuracy, running regression tests, and deciding when the model is good enough to deploy.
  • Fine-Tuning Mechanics — What fine-tuning actually modifies in a base model, how instruction tuning differs from domain adaptation, the risk of catastrophic forgetting, and how to evaluate whether a fine-tuned model actually improved.
  • How LLMs Are Trained: Pretraining — The pretraining process that produces a base language model: data scale, tokenization, next-token prediction, transformer architecture, and what the model actually learns.
  • LoRA and QLoRA: Efficient Fine-Tuning — Low-Rank Adaptation (LoRA) and Quantized LoRA (QLoRA) for dense and MoE models — how the mechanics differ, why MoE fine-tuning is harder, practical target module configurations per model family, and memory requirements.
  • Model Internals: Weights, MoE, and How Inference Works — What a trained model actually is on disk, how transformer layers are structured beyond 'a matrix of weights', how Mixture of Experts builds on dense transformers, and how a model server turns a text query into output tokens — including what's different about serving MoE vs. dense models.
  • RAG vs. Fine-Tuning: Choosing the Right Approach — When to use retrieval-augmented generation vs. fine-tuning to adapt an LLM for a specialist domain — and how the two approaches combine for a network engineer expert model that needs both current device knowledge and deep CLI intuition.
  • System Prompt Engineering for Specialist Models — How to write the system prompt for a fine-tuned network engineer specialist model — role definition, constraint specification, output format anchoring, few-shot examples, and the interaction between system prompt design and training data quality.
  • Temperature and Sampling — What temperature actually is in LLM inference, how it reshapes the probability distribution over next tokens, how it interacts with top-p and top-k sampling, and why the right setting depends entirely on the task.