Top-K Leaderboard

Blog / Top-K Leaderboard
Blog hero image

Introduction

  • A Top-K or Leaderboard system tracks and displays the highest-ranking entities based on a specific metric (score, points, sales volume, etc.).
  • While conceptually simple, the challenge lies in maintaining accurate, sorted rankings in real-time as millions of score updates flood the system simultaneously.
  • Common use cases include gaming leaderboards (high scores), most commented posts, trending hashtags, or top-selling products.

Requirements

  • Functional Requirements
    • Update Score: Users should be able to update their score.
    • Get Top K: Users can view the top K users (e.g., Top 10 globally).
    • Get User Rank: A user can see their specific rank and score, even if they aren't in the top K.
  • Non Functional Requirements
    • Scalability:
      • DAU: 10 Million.
      • Write Throughput: Handle 50k updates/second at peak (assuming high-frequency game updates).
      • Read Throughput: Handle 500k requests/second (High Read-to-Write ratio, as users check leaderboards frequently).
    • Latency (Performance)
      • Read Latency: < 20ms at p99 for fetching Top K.
      • Write Visibility: Updates should be reflected in the leaderboard within 1-2 seconds (Near Real-time).
    • Availability: 99.99% uptime for the Read path (users must always see some leaderboard, even if slightly stale).
    • Storage (Capacity): RAM usage for 10M users, a Redis ZSET entry is small (~100 bytes). Total RAM needed: ~1GB - 2GB (fits easily in memory).

Data Model

We need to store user profiles and their corresponding scores.

1. Relational Database (SQL)
  • Good for the "source of truth" regarding user profile data.
  • Tables:
    • Users: user_id (PK), username, metadata.
    • Scores Table: user_id (FK), game_id, score, timestamp.
2. In-Memory Data Structure
  • Using SQL for sorting and ranking (ORDER BY score DESC LIMIT K) performs poorly at scale because sorting is O(N log N) or requires expensive index maintenance on every write.
  • Instead, we typically use Redis Sorted Sets (ZSETS). Redis ZSETs use a Skip List internally, allowing for O(log N) updates and retrievals.

API Design

We define the interface using REST or gRPC.

1. Update Score
  • POST /v1/scores
  • Body: { "user_id": "u123", "score": 1500 }
2. Get Global Leaderboard
  • GET /v1/leaderboards/{lb_id}/top?k=10
  • Response: [{ "user_id": "u55", "rank": 1, "score": 9000 }, ...]
3. Get User Rank (and neighbors)
  • GET /v1/leaderboards/{lb_id}/users/{user_id}/context?range=5
  • Response: Returns the user's rank plus 5 users above and 5 below.

High Level Design

At a high level, the system is designed to separate the real-time ranking mechanics from the persistent data storage. We prioritize low latency for the end-user while ensuring data durability in the background.

The architecture consists of five core components:

  • Client:
    • The entry point is the user's device. It is responsible for sending score updates (e.g., after a game level is finished) and requesting leaderboard views.
  • Load Balancer / API Gateway:
    • Traffic Distribution: It distributes incoming requests across multiple stateless instances of our leaderboard service to prevent any single server from becoming a bottleneck.
    • Security & Auth: It handles SSL termination (decrypting requests) and validates user session tokens (JWT) before passing the request further. This ensures our internal services focus purely on business logic.
  • Leaderboard Write Service:
    • This is a stateless backend service (written in Go, Java, or Node.js) that orchestrates the data flow. It contains the business rules.
    • Validation: When a score comes in, this service verifies it is legitimate (e.g., preventing a user from scoring 1 million points in 1 second).
    • The Coordinator: It decides where data goes. It sends ranking-related data to Redis and profile-related data to the SQL database. It abstracts the complexity of the storage layer away from the client.
  • In-Memory Score Storage (Redis)
    • This is the heart of the leaderboard's performance. Since traditional databases are too slow for real-time sorting of millions of rows, we use Redis specifically for its Sorted Set (ZSET) data structure.
    • Role: It acts as a high-speed, temporary index. It maintains the list of User IDs and Scores in a pre-sorted order.
    • Performance: It allows us to perform operations like "Update Score" or "Get Rank" in O(log N) time, meaning it remains fast even as the number of users grows.
  • Persistent Database (SQL / NoSQL)
    • While Redis is fast, it is in-memory and expensive. We need a reliable, long-term storage solution.
    • Role: The Source of Truth.
    • Data Stored:
      • User Metadata: Full profiles (Avatars, Display Names, Country) mapped to the User IDs stored in Redis.
      • Score History: A transactional log of every score update (e.g., "User A scored 10 points at 12:01 PM"). This allows us to rebuild the leaderboard if Redis ever crashes or data is corrupted.

Redis ZSET: Internal Implementation

plaintext

1. THE HASH MAP (For O(1) Lookups) Maps Member -> Score [ "Alice" ] --> 100 [ "Bob" ] --> 150 [ "Carol" ] --> 300 [ "Dave" ] --> 300 2. THE SKIP LIST (For O(log N) Ranking/Sorting) A multi-layer linked list. Higher levels act as "express lanes". Level 3: [HEAD] -----------------------------------------> [Carol:300] -> NULL | | Level 2: [HEAD] ---------------------> [Bob:150] --------> [Carol:300] -> NULL | | | Level 1: [HEAD] ------> [Alice:100] -> [Bob:150] -> ...... [Carol:300] -> [Dave:300] -> NULL
  • The Hash Map (For O(1) Point Lookups)
    • Purpose: Instant access by Member Name.
    • Scenario: You run ZSCORE Leaderboard Bob.
    • Mechanism: Redis ignores the sorting entirely. It consults the internal Hash Map, finds the key "Bob", and instantly returns 150. It does not traverse any list.
  • The Skip List (For O(log N) Range Queries)
    • Purpose: Fast traversal by Score or Rank.
    • Scenario: You run ZRANGE ... (100 ("Find the first person > 100").
    • Mechanism: Redis cannot use the Hash Map here (because it doesn't know who has the score). Instead, it traverses the Skip List.
      • It uses upper-level "express lanes" to skip over millions of low-score users (like Alice).
      • It drops down to the base level only when it nears the target, landing directly on the first match.

Deep Dive 1: Real-time Counting at Scale

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo.
Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet.

Deep Dive 2: Scaling the Leaderboard (Partitioning)

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo.
Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet.

Deep Dive 3: Historical Data and Time Windows

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo.
Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet.

Complete Architecture

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo.
Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet.

Summary of Data Flow

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo.
Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet.

Additional Discussion Points

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo.
Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet.

Master System Design Interviews

Get ready for the exact system design questions top tech companies are asking right now. Read comprehensive editorial write-ups and practice with our AI whiteboard that simulates a real, step-by-step interviewer experience.

See All System Designs →