Short-Form Video Platform (TikTok, YouTube Shorts)

Overview
Introduction
- Designing a short-form video platform (TikTok, YouTube Shorts) is quickly becoming one of the most popular system design questions.
- Interviewers love it because it tests of how well you balance heavy, asynchronous compute workloads against the demand for instantaneous, zero-latency user experiences which is great way to figure out who are the exceptional candidates.
Requirements
- Functional Requirements
- Video Upload: Users can seamlessly upload short videos (< 3 minutes).
- Infinite Feed: The system must generate an infinite, personalized feed of videos.
- Interactions: Users can engage with content (like, swipe, skip, comment).
- Telemetry: The system must track micro-interactions (watch time, re-watches, skips) to train the feed algorithm in real-time.
- Non Functional Requirements
- Ultra-Low Latency: Feed generation must take < 50ms. Video playback must start instantly (zero buffering).
- High Availability: 99.99% uptime.
- Eventual Consistency: View and like counts can be slightly delayed globally to ensure system availability.
Estimates (Capacity Planning)
- Daily Active Users (DAU): 1 Billion
- Consumption: Average user watches 200 videos/day → 200 Billion views/day.
- Creation: 2% of users upload 1 video/day → 20M uploads/day.
- Storage (Uploads): Average compressed short video is 10MB. 20M * 10MB = 200 TB/day of raw video. Accounting for multiple resolutions and replication, expect ~1 PB/day in storage growth.
- Bandwidth (Egress): 200 Billion views → 10MB = 2,000 Petabytes (2 Exabytes) of egress bandwidth per day. This necessitates an aggressive, highly optimized Edge/CDN caching strategy.
Data Model
For a system at this scale, a polyglot persistence architecture is mandatory. No single database can handle these competing access patterns.
- User Metadata (PostgreSQL / CockroachDB): Handles relational data like user profiles and settings with high consistency.
- Video Metadata (MongoDB / DynamoDB): Document store containing video titles, CDN URLs, creator IDs, and tags. Highly scalable for fast, flexible reads.
- Interactions & Counters (Redis -> Cassandra): In-memory Redis clusters handle the immediate write-heavy load of "likes", which are eventually flushed to a wide-column store like Cassandra for permanent storage.
- Media Storage (Amazon S3): Object storage for raw .mp4 files, processed chunks, and thumbnails.
API Design
For a short form video platform we will use a classic RESTful API to interact with the data. RESTful APIs are simple, widely used, stateless, and support caching which make it a good candidate for our system.
- Upload Video:
- POST /v1/videos/upload
- Returns a presigned S3 URL for direct client-to-storage upload.
- Fetch Feed:
- GET /v1/search?q={query}
- Returns a Feed Manifest (JSON) containing the next 10 videos' metadata and CDN URLs.
- Like Video:
- POST /v1/videos/{video_id}/interactions
- Payload: {"type": "LIKE", "timestamp": "167900123"}
- Telemetry:
- POST /v1/telemetry
- Batched background endpoint for sending watch-time and skip metrics.
High Level Design
At a high level, the system can be broken down into three core paths:
- Mobile Client: Serves as the active ingestion engine, utilizing time-bound, pre-signed URLs to perform direct-to-storage multipart video uploads, fully decoupling heavy network transfer costs from our internal compute cluster.
- API Gateway / Load Balancer: Acts as the entry point, terminating SSL, authenticating the user session, and routing the upload request.
- Video Upload Service: A lightweight service that authenticates the request and generates a Pre-Signed S3 URL, allowing the mobile client to bypass our backend servers and upload files directly to storage.
- Raw Video Storage (S3): The initial landing zone for the raw .mp4 files from the user's phone.
- Kafka (Event Bus): Decouples the upload from the processing. S3 drops an "Upload Complete" event here so the system can process it asynchronously.
- Workflow Orchestrator (Temporal): The state machine manager that consumes the Kafka event and manages the complex DAG (Directed Acyclic Graph) of video processing tasks, handling retries and parallel routing.
- GPU / CPU Worker Nodes: Separate Kubernetes deployments that listen to specific Temporal Task Queues. GPU nodes handle AI/ML tasks, while CPU nodes handle heavy FFmpeg operations.
- Encoded Storage (S3 Origin): The final storage location for the processed HLS video chunks (.ts files) and master playlists (.m3u8).
- Vector Database (Pinecone): Stores the AI-generated semantic embeddings (arrays of numbers representing visual and audio concepts) used for search and recommendations.
- Video Metadata DB (DynamoDB): A NoSQL document store that holds the video's metadata (creator, tags, duration) and the S3 Origin URL.
- Client: The device containing the infinite-scroll UI and intelligent background pre-fetching logic.
- Recommendation Service: The orchestration engine that handles the 50ms matching pipeline. It receives the feed request and delegates to the caching layers.
- ML Cache (Redis Cluster): A hyper-fast, in-memory Feature Store holding pre-computed Candidate Pools, User Profile Vectors, and Real-Time Video Multipliers. The video URLs are then fetched from the Video Metadata Database.
- Content Delivery Network (CDN): A global network of edge servers (e.g., Cloudflare, Akamai) that caches the encoded video chunks geographically close to the user to guarantee high-bandwidth, low-latency streaming.
- Event Gateway: A dedicated endpoint that receives constant, lightweight telemetry from the client (e.g., swiped at 3 seconds, liked, watched 100%).
- Kafka (Ingestion Bus): Acts as the central nervous system, durably storing billions of engagement events so no user actions are lost during downstream database outages.
- Stream Processor (Apache Flink): Reads the live Kafka stream and instantly updates the ML Cache with session context (e.g., "User is swiping fast") and video virality scores.
- Engagement Workers (The Consumers): A scalable fleet of lightweight microservices. Their sole job is to consume the raw event topics from Kafka and execute atomic increment commands (like INCR video:123:likes) against the Redis cache.
- Counter Buffer (Redis Cluster): Acts as an ultra-fast "write coalescing" layer. By absorbing the rapid INCR commands from the workers, it handles massive write spikes (e.g., a viral video getting 100,000 likes in 5 seconds) entirely in memory.
- Engagement DB (Cassandra): A massive Wide-Column NoSQL store that serves as the permanent historical record. Background cron jobs (or Flink sinks) periodically sweep the Redis Buffer and flush the aggregated counts into Cassandra, turning 100,000 cached increments into a single database write.
- Batch Processor (Apache Spark): Massive offline analytics engines that run overnight, crunching the Cassandra history and Vector DB embeddings to calculate complex Long-Term User Profiles and pre-computed Candidate Pools.
This is a strong high level design. However, to separate ourselves from other candidates we will need to be able explain the core components and techniques in depth to show a level of understanding that is not just surface level.
Deep Dive 1: Video Processing Pipeline (DAG)
Deep Dive 2: Pre-fetching
Deep Dive 3: Algorithmic Feed Engine
Additional Discussion Points
Master System Design Interviews
Get ready for the exact system design questions top tech companies are asking right now. Read comprehensive editorial write-ups and practice with our AI whiteboard that simulates a real, step-by-step interviewer experience.
See All System Designs →