βΆTCP vs UDP β when do I use each?
TCP (Transmission Control Protocol): connection-oriented, ordered delivery, error correction, retransmits lost packets. Slower but reliable. Use for: HTTP, email, file transfer, anything requiring guaranteed delivery. UDP (User Datagram Protocol): connectionless, best-effort delivery, no retransmits, lower latency. Fast but unreliable. Use for: DNS queries, video streaming, online gaming, IoT sensors where speed matters more than perfection. Rule: if you can afford to lose a few packets, UDP; if you need every bit to arrive, TCP.
βΆHow does DNS resolution actually work end-to-end?
DNS = distributed phone book for the internet. Process: (1) Your browser asks your ISP's resolver 'where is google.com?', (2) Resolver queries root nameserver 'who knows .com?', (3) Root points to TLD nameserver, (4) TLD nameserver points to Google's authoritative nameserver, (5) Google's nameserver responds with IP 142.251.x.x, (6) Resolver caches and returns to browser, (7) Browser connects to 142.251.x.x. Next lookup is instant from cache. TTL (time-to-live) controls how long the cache is valid (typically 300s). Zones, records (A/AAAA/CNAME/MX), propagation delays all matter in production.
βΆWhat is BGP routing and why is it complex?
BGP (Border Gateway Protocol) = the routing protocol of the internet. Unlike OSPF (used within networks), BGP connects autonomous systems (AS). Complexity: (1) policies β routes are selected not just by shortest path but by business relationships (no peer routes to your competitor), (2) path hijacking β misconfiguration can attract traffic meant for others, (3) scale β 800k+ routes in the global table, (4) convergence β network outages take minutes to heal because of BGP's conservative design. Use: ISPs, large cloud providers, enterprises with multiple uplinks. Overkill for small networks.
βΆIPv4 vs IPv6 β why haven't we switched yet?
IPv4: 4.3 billion addresses (32-bit), running out since ~2011, address space exhaustion is real. IPv6: 340 undecillion addresses (128-bit), designed since 1998, still only ~35% deployed. Why slow adoption? (1) NAT masks IPv4 scarcity at the cost of complexity, (2) IPv6 infrastructure not everywhere, (3) double-stack (supporting both) adds operational burden, (4) enterprise inertia β 'if it works don't touch it'. Modern deployments must support both. New greenfield systems should IPv6-native. Mobile networks are IPv6-first; desktop ISPs still dual-stack.
βΆHow do VPNs and firewalls work together?
Firewall = gatekeeper at the network edge, examines packets by source/dest IP/port/protocol, allows or blocks based on rules. Stateful firewalls also track connection state (know if packet is part of established connection). VPN = encrypted tunnel from your device to a server, all traffic inside the tunnel is encrypted end-to-end. Together: (1) VPN encrypts data so ISP can't see contents, (2) firewall at both ends enforces policy on tunneled traffic, (3) in corporate networks: VPN into office β firewall checks credentials β access to internal resources. Home use: VPN masks your IP but doesn't replace firewall (router firewall is still needed).
βΆCloud networking (AWS VPC) vs on-premises β what's different?
On-prem: you own physical cables, routers, switches. Design is bottom-up (buy hardware, configure topology). Cloud (VPC): software-defined, resources abstracted behind APIs. Differences: (1) VPC is logically isolated but shares physical hardware with other tenants, (2) security groups (stateful firewalls) replace iptables for most workloads, (3) elastic IPs replace static MAC addresses, (4) routing tables configured in UI, not via BGP, (5) multi-region failover is 'just API calls' not a 3-month networking project, (6) DDoS protection is built-in (Cloudflare, Shield), (7) troubleshooting is VPC Flow Logs not tcpdump. Learning curve: VPC is easier to start, harder to troubleshoot because abstraction hides details.
βΆHow do I troubleshoot a network problem when users say 'the internet is slow'?
Start with layers: (1) Is DNS working? `dig google.com` (should return IP in <100ms). (2) Is routing working? `traceroute google.com` (should reach within 10-15 hops). (3) Is the connection itself slow? `ping` (latency in ms), `iperf` (throughput in Mbps). (4) Is it application-level? curl with timing: `curl -w '%{time_total}' google.com`. Tools in order: ping β traceroute β tcpdump β Wireshark. Tcpdump is the X-ray β captures packets and reveals dropped frames, retransmits, out-of-order delivery. Common culprits: ISP congestion (check at different times), DNS misconfiguration (wrong resolver), firewall rules blocking, TCP window size too small, MTU mismatch (fragmentation).