Autocomplete System (Google Search)

Overview

Introduction
Requirements
Data Model
API Design
High Level Design
Deep Dive 1: Efficient Data Structures (The Trie)
Deep Dive 2: Reducing Latency (Multi-Level Caching)
Deep Dive 3: Zero-Downtime Updates (Blue/Green Deployment)
Deep Dive 4: Ranking & Real-time Trends (Lambda Architecture)
Complete Architecture
Additional Discussion Points

Introduction

An autocomplete system (also known as typeahead or search-as-you-type) predicts the rest of a word or sentence as the user types. This is a standard feature in search engines (Google), e-commerce sites (Amazon), and command-line interfaces.
The goal is to speed up user interaction by reducing keystrokes and guiding them toward likely queries.

Requirements

Functional Requirements
- Suggestions: As the user types a query, the system must return the top k (e.g., 5) most relevant suggestions.
- Ranking: Suggestions must be ranked by a combination of Historical Popularity (long-term accuracy) and Real-time Trends (breaking news).
Non Functional Requirements
- Ultra-Low Latency: The system must return results in < 100ms (P99) to ensure a smooth typing experience.
- High Availability: The system should prioritize availability over consistency (BASE). Serving slightly stale suggestions is better than an error.
- Scalability:
  - DAU: 500 Million.
  - Daily Searches: 5 Billion.
  - Read Load (QPS): ~290k Avg / ~600k Peak
    Calculation: (5B searches * 5 requests) / 86,400s ≈ 290k
    Peak: Assumes standard 2 * traffic spike
  - Write Throughput: ~58k events/sec
    Calculation: 5B searches / 86,400s ≈ 58k
    Note: Logs only the final "Search" event, not intermediate keystrokes.

Data Model

We need distinct storage strategies for the Raw Data (optimized for write throughput and analytics) and the Index Data (optimized for read latency).

1. Storage / Analytics (Write-Optimized)

Purpose: Archival and offline batch processing.
Technology: AWS S3 (Storage) + Apache Parquet (Format - Columnar Storage).
Schema: query_string (String), timestamp (DateTime), geo_location (String).

2. Serving Index (Read-Optimized)

Purpose: Serving autocomplete suggestions in < 100ms. We cannot scan the Raw Data or use a standard SQL DB for this. We use a Trie (Prefix Tree) structure held in memory.
Structure: A Trie (Prefix Tree) is a tree where the root is empty, and each node represents a character.
Optimization: We flatten this logical structure into a Key-Value store (or In-Memory Map).
- Key: Prefix (e.g., "sys")
- Value: List of Top-5 Suggestions (e.g., ["system design", "sysadmin", ...])
Technology:
- Redis Cluster (Primary Cache)
  - Role: The "Hot" Store. Serves 99% of traffic.
  - Key: The Prefix (e.g., "sys").
  - Value: Serialized JSON List of Top 5 suggestions (e.g., ["system design", "sysadmin", ...]).
- Amazon DynamoDB (Persistent Store)
  - Role: The "Warm" Store. Used to rebuild Redis if it crashes and to serve "Cache Misses."
  - Partition Key: prefix (String).
  - Attributes: suggestions (List<String>), last_updated (Timestamp).
  - Why DynamoDB? It offers single-digit millisecond latency and scales horizontally to handle the "Long Tail" of billions of rare prefixes that won't fit in Redis RAM.

API Design

We need two distinct endpoints following the CQRS (Command Query Responsibility Segregation) pattern: one for reading (Autocomplete) and one for writing (Search).

1. Autocomplete Endpoint (Read)

GET /v1/autocomplete?query={prefix}&limit={k}
Response: JSON List of strings.
Protocol: HTTP/2 (to avoid TCP handshake overhead on every keystroke).
Note: This endpoint is read-only and cached heavily.

2. Search Endpoint (Write)

GET /v1/search?query={full_query}
Action: Returns search results (HTML/JSON) AND logs the query intent for analytics.
Note: This is the "Ground Truth" data source for our ranking models.

High Level Design

At a high level, the system implements the CQRS (Command Query Responsibility Segregation) pattern, splitting the architecture into two distinct flows: the Read Path (latency-sensitive) and the Write Path (throughput-heavy).

1. The Read Path (Query Service)

Client: Sends a request on every keystroke (debounced by ~50-100ms) to minimize network traffic.
Load Balancer (LB): Distributes incoming HTTP/2 requests across the stateless Autocomplete Service fleet.
Autocomplete Service: Orchestrates the lookup logic with a Multi-Level Caching strategy:
- L1 Cache (Local RAM): Checks an in-process cache (e.g., Guava/Caffeine - Java Caching Libraries) first. This serves "Super Hot" keys (top 1%) with zero network I/O, preventing a "Hot Key" meltdown on Redis.
- L2 Cache (Redis Cluster): If L1 misses, it queries Redis.
- Fallback (DynamoDB): If Redis misses, it reads from the active DynamoDB table and populates the cache (Read-Through).
Redis Data Structure: Redis does not store a tree. It stores a Flattened Key-Value map (e.g., Key: "sys", Value: ["system", "sysadmin"]) for O(1) retrieval.

2. The Write Path (Data Gathering Service)

Client: When the user hits "Enter" or selects a suggestion, a "Search Commit" event is fired.
Search Service: Handles the web search logic and asynchronously logs the event to a local file (Fire & Forget), ensuring the user experiences zero added latency.
Log Agent (Sidecar): A background process (e.g., Fluentd/Filebeat) tails the log file and pushes events to the message queue.
Kafka: Acts as the high-throughput buffer to decouple the production services from the data processing layer.
Data Store: Store historical logs from in S3 Data Lake.
Batch Processor: Aggregates logs daily to rebuild the Main Trie (stored in DynamoDB).