Video Streaming and Sharing Platform (YouTube, Vimeo)

Overview
Introduction
Designing a video streaming platform like YouTube is a highly popular and relevant system design interview question. While it may not be the most complex system to design, it presents unique challenges that interviewers expect candidates to address. From handling massive amounts of video data and optimizing video delivery to managing user-generated content and implementing scalable search and recommendation systems, the scope is broad. In this solution, we'll explore the key problems that need to be solved and provide insights into how to tackle them effectively in an interview setting.
Requirements
- Functional Requirements
- Video Upload: The system should allow users to upload video files of various formats and sizes efficiently and securely. This includes handling large files, providing progress feedback, and supporting resumable uploads in case of interruptions.
- Video Encoding and Transcoding: The system needs to process and convert uploaded videos into multiple standardized formats and resolutions. This ensures compatibility across different devices and network conditions, enabling smooth playback for all users.
- Stream Video Content: The system should enable users to stream video content seamlessly. It must support adaptive bitrate streaming to adjust video quality in real-time based on the user's network speed and device capabilities, minimizing buffering and ensuring a high-quality viewing experience.
- Notifications: Users should be notified of key events in real time to improve user experience.
- Non Functional Requirements
- Low Latency: The system must deliver video content with minimal delay, ensuring quick load times and smooth playback. This involves optimizing network routes, using content delivery networks (CDNs), and efficient data retrieval methods to enhance the user experience.
- High Availability: The system should be reliably accessible at all times, with minimal downtime. This requires implementing fault-tolerant infrastructure, redundancy, and failover mechanisms to maintain continuous service even in the event of hardware failures or high traffic volumes.
- High Scalability: The system must be capable of handling increasing numbers of users and data volumes without performance degradation.
Estimates
- Daily Active Users (DAU): 2 billion
- Average video uploaded per user per day: 0.005 videos (0.5% of DAUs upload a video daily)
- Average video size: 100 MB (at 1080p resolution, H.264 compression)
- Daily uploads: 2B × 0.005 = 10M videos/day
- Daily storage for uploads: 10M × 100MB = 1,000 TB/day = 1 PB/day
- Annual video uploads: 1 PB/day × 365 days = 365 PB/year
- Total storage over 10 years: 365 PB/year × 10 years ≈ 3,650 PB ≈ 3.65 EB (Exabytes)
- Note: the actual storage requirements would likely be 3-4x higher when accounting for several copies for redundancy and multiple resolutions.
- Average number of videos streamed per user per day: 5
- Average video duration: 5 minutes
- Streaming bitrate (1080p): 5 Mbps
- Average video size streamed per user per day: 5 videos × 5 minutes / video × 60 seconds × 5Mbps = 7,500 Mbps/user/day
- Convert to GB: 7,500 Mbps × (1/8) × (1/1024) ≈ 0.9155 GB/user/day
- Total daily bandwidth: 2B users × 0.9155 GB/user/day ≈ 1.83 EB/day (Exabytes)
Data Model
This is a basic outline of some of the core tables that could be included in a video streaming and sharing platform data model.
- Users: Stores information about users.
- Notifications: Stores information about notifcations sent to users (e.g. video published).
- Videos: Stores the core information related to a video (e.g. title, description)
- Video Reaction, Comments, Video Meta, & Thumbnails: Store additional information about the video
- Job Tracker: Table used to track the progress of a video as it is being processed. Discussed in depth later.
API Design
For a video streaming and sharing platform we will use a classic RESTful API to interact with the data. RESTful APIs are simple, widely used, stateless, and support caching which make it a good candidate for our system.
Our REST API will comprise of four main endpoints:
- POST: /api/videos/upload/initiate
- Description: Allows a user to initiate an upload session.
- Params: Video metadata (title, description, tags) & number of chunks and their sizes.
- Response: An upload_id (unique identifier for the upload session) & a list of presigned URLs for each chunk.
- POST: /api/videos/upload/complete
- Description: Allows a user to signal an upload has been completed.
- Params: upload_id.
- GET: /api/videos/search?query={query}
- Description: Allows a user to search for videos.
- Response: List of video metadata (e.g., titles, thumbnails, video IDs).
- GET: /api/videos/{video_id}/playback
- Description: Allows a user to start playing a video.
- Params: video_id
Upload Video Flow
- Client App
- Action: A user attempts to upload a video in the UI which will send a POST request to the /api/videos/upload/initiate endpoint to initiate an upload session.
- Authentication & Authorization: The request includes an authentication token (e.g., JWT) verifying the user's identity and permissions.
- API Gateway
- Routing: The API Gateway receives the request, authenticates the user, applies rate limiting, and forwards the request to the Video Upload Service.
- Additional Functionality:
- Authentication and Authorization: Verifies the JWT token.
- Rate Limiting: Prevents abuse by limiting the number of uploads per user.
- Load Balancing: Distributes requests across multiple instances of the Video Upload Service for optimal performance.
- Video Upload Service
- Presigned URLs: Generates presigned URLs which are short lived and scoped to the relevant chunks, using the storage service's SDK (Amazon S3). Each URL allows the client to upload a specific chunk directly to the storage service. Presigned URLs are great for security as they ensures that only authenticated users can upload data, as well as efficiency as it enables direct upload to storage without routing through the application servers.
- Client App
- Chunk Uploads: The client uploads each chunk using the presigned URLs. Chunks can be uploaded concurrently to optimize speed. Each chunk is uploaded via an HTTP PUT (generally used for chunk uploads) to the corresponding presigned URL. The client can also tracks the upload progress of each chunk and displays feedback to the user via the UI. If a chunk were to fail most modern browser file upload APIs support retries and support error states if they persist.
- Temporary Storage: Chunks are stored as part objects associated with the upload_id in the Raw Video Storage. There can also be an optional checksum verification step to verify data integrity for each chunk.
- Client App
- Upload Complete: Once the upload is complete the client sends a request to the POST /api/videos/upload/complete endpoint.
- Routing: This request is again routed via the API Gateway.
- Video File Construction: The Video Upload Service first validates that all chunks have been received. It can then instructs the storage service to assemble the chunks into the final video file which is stored in a designated bucket. This is often achieved using an API call like CompleteMultipartUpload in AWS S3.
- Message Creation: Upon successful assembly, the Video Upload Service places a message in the Job Tracker Queue (which can be implemented using either AWS SQS, Apache Kafka). The message includes the video metadata and storage location. This queue is important as it decouples the services making them easier to scale.
- Job Tracker Service
- Message Consumption: Pulls messages off the Job Tracker Queue.
- Job Creation: It then creates a job in the Job Database. The purpose of the Job Database is to store the state relating to the processing of a video so that once successfully processed or an error has occurred a user can be notified accordingly. In terms of what database to use, it will simply hold one table that has a few columns tracking each of the individual processing tasks and it will be write heavy, so therefore, a NoSQL database like DynamoDB would be a great choice here.
- Message Creation: Once a job has been created in the database, the Job Tracker Service places a message on the Encoding Queue (which can be implemented using either AWS SQS, Apache Kafka).
- Encoding Workers
- Message Consumption: Pulls messages off the Encoding Queue and starts to the process of encoding (which typically also includes compressing) the video.
- Raw Video Retrieval: Videos are retrieved from the Raw Video Storage.
- Process:
- Encoding: Converts the video into multiple resolutions (e.g., 1080p, 720p, 480p) and formats (e.g., MP4, WebM).
- Compression: Reduce the file size without significant loss of quality, ensuring faster streaming and lower storage costs and often happens as part of the encoding process using codecs like H.264 or H.265, which simultaneously encode and compress the video.
- Encoded Video Storage: Videos are stored in the Encoded Video Storage bucket.
- Message Creation: Once complete the worker puts a message on the Media Processing Queue. The reason this step is decoupled is because all of the downstream services rely on the videos being encoded in the correct format and therefore, this encoding process must run sequentially before the other media processing tasks, which can subsequently run in parallel as they do not rely on each other.
- Media Processing Workers
- Message Consumption: Each of the workers are subscribed to the topic and consume messages when they arrive.
- Validation Worker: Checks for compliance with content policies by detecting inappropriate content using AI/ML models.
- Thumbnail Worker: Generates thumbnail images for video previews by selecting frames that best represent the video. Once the thumbnails are selected they are stored in an object storage (e.g. AWS S3, Google Cloud Storage). Easily integrate with CDNs for fast, low-latency delivery. Data is replicated across multiple availability zones.
- Metadata Worker: Stores metadata in variety of datastores. While a two-phase commit could theoretically synchronize the database and Elasticsearch, it's impractical due to complexity and performance overhead. Here we are willing to accept eventually consistency as search functionality is not the most critical feature of the application. If a failure were to occur it would be important to have appropriate retry and alerting mechanisms in place.
- Relational Database (PostgreSQL): Stores core metadata (e.g., video_id, uploader_id, duration, upload_date). This can be useful for administrative tasks or reports.
- Elasticsearch: Indexes video metadata for fast, full-text search.
- Message Creation: Once a worker completes it sends a message to the Job Update Queue. If for any reason one of the workers fail, retries are implemented with exponential backoff and if for a predefined number of retries it fails a failure message is placed on the Job Update Queue.
- Message Consumption: Each of the workers are subscribed to the topic and consume messages when they arrive.
- Job Tracker Service:
- Job Update: Pulls messages from the Job Update Queue and updates the corresponding job in the Job Database accordingly.
- Notification Creation: If all processes are completed (successful or unsuccessful) a message can be placed on the Notification Queue to notify the user that processing has completed.
Receive Notification Flow
Stream Video Flow
Complete Architecture
Additional Discussion Points
Master System Design Interviews
Get ready for the exact system design questions top tech companies are asking right now. Read comprehensive editorial write-ups and practice with our AI whiteboard that simulates a real, step-by-step interviewer experience.
See All System Designs →