System Design Essentials: Databases, Messaging, HTTP & DNS Explained

This week’s system design refresher covers various fundamental concepts including Transformers Step-by-Step Explained (Youtube video), Database Types essential for 2025, a comparison of Apache Kafka vs. RabbitMQ, an exploration of The HTTP Mindmap, and a detailed explanation of How DNS Works, alongside an inquiry into real-time updates from web servers.

Transformers Step-by-Step Explained (Attention Is All You Need)

Database Types You Should Know in 2025

The concept of a universal database no longer applies to contemporary software development. Modern applications frequently leverage diverse database types, catering to needs ranging from real-time analytics to advanced vector search crucial for AI functionalities. A deep understanding of the appropriate database selection is paramount, as it directly influences a system’s overall performance and scalability.

image 0

  • Relational: These are traditional row-and-column databases, well-suited for structured data and transactional operations.
  • Columnar: Optimized for analytical workloads, these databases store data by columns, facilitating rapid aggregations and reporting.
  • Key-Value: Data is stored as simple key–value pairs, enabling exceptionally fast lookups and retrieval operations.
  • In-memory: Data is maintained in RAM, providing ultra-low latency lookups, making them ideal for caching mechanisms or managing user sessions.
  • Wide-Column: Designed to manage vast quantities of semi-structured data across distributed nodes, offering significant horizontal scalability.
  • Time-series: Specifically engineered for metrics, logs, and sensor data, where time serves as a primary dimension for indexing and querying.
  • Immutable Ledger: Guarantees tamper-proof and cryptographically verifiable transaction logs, essential for auditing and integrity.
  • Graph: Excels at modeling complex relationships between data points, making it perfect for applications like social networks and advanced fraud detection systems.
  • Document: Offers flexible JSON-like storage, highly suitable for modern applications with evolving schemas and agile development cycles.
  • Geospatial: Manages location-aware data effectively, supporting functions such as maps, route planning, and complex spatial queries.
  • Text-search: Provides full-text indexing and search capabilities, complete with ranking, filtering, and powerful analytics features.
  • Blob: Designed for storing unstructured objects, including images, videos, and various file types, without imposing a schema.
  • Vector: A critical enabler for AI/ML applications, facilitating similarity search across high-dimensional embeddings.

Considering the evolving landscape of data storage, which database type is anticipated to experience the most significant growth over the next five years?

Apache Kafka vs. RabbitMQ

Both Apache Kafka and RabbitMQ are robust messaging systems; however, they address distinct challenges in distributed system architectures. A clear comprehension of their fundamental differences is crucial for effective system design.

image 1

Kafka operates as a distributed commit log. Producers append messages to specific partitions, and these messages persist based on a defined retention policy, irrespective of consumer consumption status. Consumers retrieve messages at their own pace by utilizing offsets, enabling capabilities such as message rewinding, replaying, and reprocessing. It is architected for high-throughput event streaming scenarios where multiple independent consumers require access to the same data streams.

Conversely, RabbitMQ functions as a traditional message broker. Producers publish messages to exchanges, which subsequently route these messages to various queues based on predefined binding keys and patterns (e.g., direct, topic, fanout). Messages are actively pushed to consumers and are subsequently deleted upon acknowledgment. This system is primarily designed for efficient task distribution and conventional messaging workflows.

A frequent misconception involves deploying Kafka as a simple message queue or utilizing RabbitMQ as an event log. These platforms are distinct tools, each optimized for specific use cases and operational paradigms.

For those familiar with these technologies, a discussion on scenarios where Kafka might not be the optimal choice for messaging infrastructure is encouraged.

The HTTP Mindmap

The Hypertext Transfer Protocol (HTTP) has undergone significant evolution, progressing from HTTP/1.1 to HTTP/2, and most recently to HTTP/3, which leverages the QUIC protocol over UDP for enhanced performance. Presently, HTTP serves as the foundational protocol for nearly all internet communications, encompassing web browsers, APIs, streaming services, cloud infrastructure, and AI systems.

image 2

At its core, the architecture relies on underlying protocols such as TCP/IP for managing IPv4 and IPv6 traffic, and Unix domain sockets for efficient local communication. Notably, HTTP/3 operates over UDP rather than TCP, optimizing data transport prior to HTTP processing.

Security measures are integrated across the entire ecosystem. HTTPS has become an indispensable standard. WebSockets facilitate real-time connections. Web servers are responsible for managing computational workloads. Content Delivery Networks (CDNs) ensure global content distribution, while DNS resolvers translate domain names into IP addresses. Various proxies, including forward, reverse, and API gateways, are deployed to route, filter, and secure traffic flows.

Web services engage in data exchange using diverse formats, such as REST with JSON, SOAP for enterprise-grade systems, RPC for direct procedural calls, and GraphQL for flexible data querying. Web crawlers and bots index internet content, adhering to directives specified within robots.txt files that delineate access boundaries.

The broader network landscape interconnects these elements, involving Local Area Networks (LANs), Wide Area Networks (WANs), and protocols like FTP for file transfers, IMAP/POP3 for email management, and BitTorrent for peer-to-peer communication. For comprehensive observability, tools such as Wireshark, tcpdump, and OpenTelemetry enable developers to scrutinize network traffic, providing insights into performance, latency, and system behavior across the entire stack.

Considering HTTP’s continuous evolution over three decades, identifying the forthcoming major architectural shift presents an intriguing area of discussion.

How DNS Works

When a user inputs a domain name and initiates a request, the underlying processes preceding webpage loading are more intricate than commonly perceived. The Domain Name System (DNS) functions as the internet’s directory, where each request triggers a sequence of lookups across multiple servers.

The process unfolds in a series of steps:

  • Step 1: A user enters a domain name, for instance, bytebytego.com, into their web browser and presses enter.
  • Step 2: Prior to any external lookup, the browser first checks its internal cache for a corresponding IP address. The operating system’s cache is also consulted.
  • Step 3: A cache miss initiates a DNS query. The browser dispatches this query to its configured DNS resolver, which is typically provided by an Internet Service Provider (ISP) or a public service like Google DNS or Cloudflare.
  • Step 4: The DNS resolver then inspects its own cache for the requested information.
  • Step 5-6: If the resolver’s cache does not contain the answer, it queries the root servers, asking for the location of the “.com” Top-Level Domain (TLD) name server. For bytebytego.com, the root server responds with the address of the .com TLD name server.
  • Step 7-8: The resolver proceeds to query the .com TLD server, which subsequently returns the address of the authoritative name server for the specific domain.
  • Step 9-10: This authoritative server holds the actual A/AAAA record, which maps the domain name to its corresponding IP address. The resolver finally obtains the answer, for example, 172.67.21.11 for bytebytego.com.
  • Step 11-12: The acquired IP address is cached at the resolver level to expedite future lookups and is then relayed back to the browser.
  • Step 13-14: The browser stores this for its own subsequent uses and then employs it to initiate the actual HTTP request to the web server.
  • Step 15: Finally, the web server processes the request and returns the desired content to the browser.

All these intricate operations are completed within milliseconds, even before the initial page content begins to load.

Regarding DNS troubleshooting and diagnostics, which specific tools or commands, such as dig, nslookup, or others, are most frequently utilized by technical professionals?

Can a Web Server Provide Real-time Updates?

An HTTP server, by its fundamental nature, cannot automatically initiate a connection to a client’s web browser. Consequently, the web browser always acts as the initiator in HTTP communication. The challenge then becomes how to receive real-time updates from the HTTP server efficiently.

Both the web browser and the HTTP server can contribute to achieving real-time updates through different architectural patterns.

Client-side approaches typically involve the web browser performing the primary effort, utilizing techniques such as short polling or long polling. In short polling, the browser repeatedly sends requests at intervals until it retrieves the most current data. Conversely, with long polling, the HTTP server holds the request open and only responds once new data becomes available.

Collaborative solutions, where both the HTTP server and web browser interact more dynamically, include WebSocket or Server-Sent Events (SSE). In both scenarios, after an initial connection is established, the HTTP server can directly transmit the latest data to the browser. The key distinction lies in their directionality: SSE is uni-directional, allowing the server to push updates to the browser without the browser sending new requests, whereas WebSocket provides a fully-duplex communication channel, enabling continuous bi-directional exchange of data.

Among the four discussed solutions (long polling, short polling, SSE, and WebSocket), an analysis of their common usage patterns and specific application use cases is valuable for architectural considerations.