Preface#

Why we wrote this book

Most Kubernetes literature is designed for administrators and developers. For administrators, the focus is typically on the high-level installation, configuration, and troubleshooting of nodes and clusters. For developers, the emphasis is on leveraging Kubernetes’ service abstractions to build and deploy cloud-native applications.

Existing literature primarily focuses on the interfaces and abstractions provided by Kubernetes, rather than the underlying implementation details. While many users can successfully operate a cluster without understanding the “plumbing” — such as how containers are wired within a Pod or how VXLAN tunnels facilitate multi-host communication — this “abstraction-first” approach leaves a significant gap when deep-seated issues arise.

While technical information on network implementation does exist, it is often fragmented. Engineers often have to piece together knowledge from isolated blog posts, specific Container Network Interface (CNI) plugin documentation, or technical articles. There is a lack of cohesive, pedagogical documentation that explains how the Linux kernel and its primitives are used to create networks of Pods.

The goal of this book is to provide a comprehensive guide to Kubernetes networking by exploring its implementation from the ground up. We aim to move beyond simple configuration to help you understand the Linux technologies that make orchestration possible:

Linux Primitives: You will learn how namespaces and cgroups provide resource isolation and management.
Virtual Plumbing: We demonstrate how to manually wire containers using Virtual Ethernet (veth) pairs and Linux Bridges.
Traffic Management: We recreate service discovery and load balancing using iptables, netfilter, and conntrack.
Cluster Fabric: We explore high-scale networking using Open vSwitch (OVS) and overlay networks.
Deep Observability: We utilize eBPF to attach kprobes and tracepoints directly to the kernel for real-time performance monitoring.

By manually deploying a cluster and recreating its networking features using Linux tools, you will gain the insights required to operate, optimize, and troubleshoot production-grade environments.

What will you learn

Throughout this book, we utilize a running example of a Pod specification to demonstrate how the core components of Kubernetes can be replicated using native Linux technologies. Each chapter builds upon this foundation, taking a deeper dive into the individual layers of the Kubernetes architecture and showing you how to recreate its features manually.

Pod Networking: We begin by exploring the “plumbing” of a single Pod. You will learn how to use Linux namespaces for process isolation and Virtual Ethernet pairs connected to Linux Bridges to enable intra-pod and inter-pod communication.
Service Networking: Next, we move into traffic abstraction. You will learn how Kubernetes provides stable access points through ClusterIP, NodePort, and LoadBalancer services. We will recreate these functionalities using iptables, netfilter, and conntrack, focusing on how Destination NAT (DNAT) manages load balancing and Source NAT (SNAT) ensures proper packet return paths.
Cluster Networking: We then expand to multi-host environments. You will explore how Container Network Interface plugins facilitate communication across different nodes using Open vSwitch and VXLAN tunnels. This section includes a hands-on guide to building a custom CNI plugin from scratch using Bash to manage IP address allocation and routing.
Network Observability: To ensure the health of these complex systems, we dive into modern instrumentation. You will learn to use eBPF to attach kprobes and tracepoints directly to the Linux kernel, allowing you to monitor critical metrics like TCP retransmissions with minimal overhead. We also cover the correlation of these low-level metrics within Prometheus and Grafana.
Cluster Deployment: Finally, we bridge the gap between being a user and a system engineer by manually deploying a Kubernetes cluster. Moving beyond automated tools like Minikube, you will learn to use kubeadm, Multipass, and cloud-init to provision a multi-node environment, providing you with a comprehensive understanding of the control plane and worker node lifecycle.

By the end of this book, you will not just know how to use Kubernetes networking, but you will understand the underlying Linux primitives that make it possible.

Who should ready this book

This book is specifically designed for technical professionals who want to move beyond the high-level abstractions of Kubernetes and master the underlying Linux implementation. It is particularly suitable for:

Platform Engineers: Those responsible for architecting and maintaining the cluster fabric. This book provides a deep dive into manually provisioning clusters using kubeadm and Multipass, as well as the internal mechanics of CNI plugins like Calico and Cilium.
Security Engineers: Professionals focused on ensuring robust isolation and secure communication. You will learn the mechanics of network isolation through Linux namespaces and cgroups, and how to implement traffic policies and security boundaries using the netfilter/iptables stack.
Site Reliability Engineers (SREs): Engineers tasked with diagnosing complex, intermittent performance issues. This book covers advanced network observability using eBPF to attach kprobes and tracepoints to the kernel, enabling you to monitor low-level health indicators like TCP retransmissions and packet drops.
DevOps and Cloud Engineers: Users who regularly manage Pods, Services, and Ingress but want to understand the mechanics behind them. By building a pseudo-Pod from scratch and manually recreating ClusterIP and NodePort services using NAT rules, you will gain the system-level insight required to troubleshoot networking failures.

By focusing on implementation details rather than just the interface, this book serves as a bridge for anyone looking to transition from a platform user to a system engineer.

Technical requirements

To follow the hands-on demonstrations and recreate the networking architectures described in this book, you will need a local workstation or laptop capable of running the following tools and environments:

Virtualization: We use Multipass as the primary infrastructure provider to provision and manage virtual machines. These instances run Ubuntu and serve as the foundation for both our manual “pseudo-Pod” experiments and our multi-node Kubernetes cluster.
Container Tools: You will need to install containerd as the underlying container runtime. For cluster-level operations, we utilize kubeadm to initialize the control plane and join worker nodes.
Linux Networking Primitives: Much of the book’s “bottom-up” approach relies on native Linux utilities. You should have access to a Linux environment (or the aforementioned Multipass VMs) to use iproute2 (for namespaces, veth pairs, and routing), iptables (for NAT and service simulation), and Open vSwitch.
Scripting and Automation: Several advanced exercises — including building a custom CNI plugin from scratch — are conducted using Bash. We also provide various management scripts (e.g., pod_manager.sh, ovs_manager.sh, and cluster_manager.sh) to automate complex configurations.
Observability Tools: For the chapters on network observability, you will use bpftrace and Python-based wrappers to load and interact with eBPF bytecode directly within the Linux kernel.

After completing this book, you will be equipped to handle Day-2 procedures, which involve analyzing clusters from the virtual machine layer down to individual Pods to assess system health and identify failures.

Extended reading

Several manuscripts complement this book, providing the theoretical foundations and practical knowledge essential for understanding the topics discussed throughout these chapters. The journey begins with distributed cluster management and resource orchestration, most notably Borg [VPK+15], which established the design principles for modern platforms such as Kubernetes. Building upon these origins, several works focus on Kubernetes operations and cloud-native practices, covering DevOps methodologies, Helm package management, and cluster administration [AD19] [BFD21] [BT18] [BBH19].

To bridge the gap between orchestration and infrastructure, additional references explore fundamental and advanced networking concepts. These include:

Networking Fundamentals: Layered approaches and computer networking basics [KR22] [SL21].
System Internals: Detailed examinations of Linux firewalls and container networking architectures [Ghe06] [Hau18].
Advanced Implementations: Deep dives into Kubernetes networking and service delivery [GPP19] [Hau16].

Together, these manuscripts provide foundational background and implementation-oriented knowledge that supports the technologies and concepts described throughout this book.

Who wrote this book

Jorge Cardoso is a software engineer and architect at NVIDIA, specializing in reliability, observability, and diagnostics for AI and high-performance infrastructures. His work monitoring complex systems is reflected in many techniques covered in these pages.

Before joining NVIDIA, Jorge was the Director of the Ultra-scale AIOps Lab at Huawei Cloud, leading the development of AI-driven autonomous operations globally. His extensive industry tenure—including roles at SAP and Boeing — drives the technical depth of this book’s analysis of cluster networking and Kubernetes internals.

An invited professor at the University of Coimbra with a Ph.D. from the University of Georgia, Jorge bridges the gap between research and practice through this book’s “bottom-up” pedagogical approach.