Overview
How language models learn, and what fine-tuning does to that knowledge.
Entries
- How LLMs Are Trained — Pretraining, tokenization, next-token prediction, and what a base model knows
- Fine-Tuning Mechanics — What fine-tuning changes, instruction tuning vs. domain adaptation, catastrophic forgetting
- LoRA and QLoRA — Low-rank adaptation for dense and MoE models: how they work, why MoE fine-tuning is harder, target module configs per model family, memory requirements
- Base Models for Fine-Tuning — Survey of Llama, Mistral, Gemma, Phi, Qwen, and DeepSeek: who makes them, architecture, licensing, and suitability for specialist fine-tuning
- RAG vs. Fine-Tuning — When to use retrieval-augmented generation vs. fine-tuning, how they compose, and the practical architecture for a network engineer specialist model
- Config Generation vs. Troubleshooting — Why these are two different tasks with different training data, interaction patterns, and quality criteria — and the case for separate LoRA adapters
- Model Internals: Weights, MoE, and Inference — What a model file actually contains, how transformer layers work beyond “a matrix of weights”, how MoE routing works, GGUF quantization formats, and a full trace of a query through the inference stack
- Temperature and Sampling — What temperature actually is (logit scaling before softmax), how it reshapes probability distributions, top-p and top-k sampling, and why the right setting depends on the task
- Evaluating a Fine-Tuned Model — Building a domain-specific eval set, exact-match and semantic scoring for CLI output, regression testing for general capability, and what “good enough” looks like before deployment
- System Prompt Engineering — Writing the system prompt for a specialist model, the mandatory training-inference consistency requirement, few-shot examples in the system prompt, and testing before generating training data