Music Streaming Service (Spotify)

Overview
Introduction
- A music streaming service provides millions of users with on-demand access to a vast library of audio content.
- Designing a robust music stream service goes beyond simple music playback. It also tackles tough technical challenges concerning availability, low latency (minimizing "Time to First Byte"), and supporting multiple bitrates for playback across varying network conditions.
Requirements
- Functional Requirements
- Artist Upload: Creators can upload high-res audio and manage metadata.
- Audio Streaming: Users can play music with adaptive quality based on network.
- Search: Sub-second fuzzy search across artists, albums, and tracks.
- ML Recommendations: Recommend songs that users would like.
- Royalty Payments: Accurate tracking of "valid" listens for payment disbursement.
- Non Functional Requirements
- Scalability: Support 500M+ DAU and 100M+ tracks.
- Low Latency: < 200ms "Time to First Byte" for audio playback globally.
- Availability: 99.99% for the playback path; 99.9% for the ingestion path.
- Consistency: Strong consistency for royalty ledgers; Eventual consistency for social signals (likes/follows).
Data Model
For a system of this scale, a Polyglot Persistence strategy (using multiple types of databases simultaneously to handle different data needs) is required.
- Relational (PostgreSQL): Serves as the system’s "Source of Truth" for structured metadata. ACID compliance is non-negotiable for financial-grade royalty calculations and relational indexing enables the complex joins required for large-scale playlist metadata management and artist-track relationships.
- Object Storage (S3/GCS): Acts as the immutable storage layer for high-resolution audio assets. By storing audio segments as BLOBs, we decouple storage from compute, enabling massive horizontal scalability and seamless integration with CDNs for global edge delivery. Also used as a data lake for high-volume user activity logs (every play, skip, and like).
- Elasticsearch: A distributed search and analytics engine that powers the "Discovery" experience. It utilizes inverted indices to provide sub-second fuzzy matching and prefix searching (autocomplete), ensuring high-quality results even with user typos or incomplete artist names.
- NoSQL (DynamoDB): Provides a high-performance Key-Value store for Pre-computed Recommendations. It is used to store the "Candidate Sets" generated by Spark (e.g., user_id -> list_of_recommended_songs), offering single-digit millisecond retrieval when a user logs in.
API Design
For a music streaming service we will use a classic RESTful API to interact with the data. RESTful APIs are simple, widely used, stateless, and support caching which make it a good candidate for our system.
- Ingest Track: POST /v1/ingest/upload (returns presigned S3 URL).
- Search: GET /v1/search?q={query}&type=track (returns ranked results).
- Get Stream: GET /v1/tracks/{id}/manifest (returns HLS manifest and signed URLs).
- Heartbeat: POST /v1/me/play-event (telemetry for royalties).
High Level Design
At a high level, the system can be broken down into two core paths:
This path focuses on getting content from the creator to the storage layer.
- API Gateway: Acts as the entry point for artists, handling identity verification, SSL termination, and routing requests to internal services.
- Ingestion Service: Orchestrates the upload process by initializing track records in the database and generating presigned URLs for direct-to-S3 uploads.
- Raw Audio S3 Bucket: Serves as the durable object store for high-resolution files. It is configured to emit Object Created events upon successful file finalization.
- Raw Audio Queue (SQS): Decouples the storage layer from the metadata layer, holding ingestion events to ensure the system can handle bursts of uploads reliably.
- Metadata Service: Consumes messages from the queue to finalize track status and ensure the Metadata DB reflects the most recent uploads.
- Metadata DB (PostgreSQL): The relational "Source of Truth" that stores artist profiles, track information, and the current state (e.g., PENDING vs PUBLISHED) of every asset.
This path focuses on how a consumer discovers a track and retrieves the audio data.
- API Gateway: Manages consumer traffic, providing a unified interface for search and playback requests.
- Metadata Service: Interfaces with the database to retrieve track manifests and location metadata for the requesting client.
- Metadata DB (PostgreSQL): Supplies the structured data required to populate the user's interface and the URLs for the streaming manifest.
- CDN (Content Delivery Network): A distributed network of edge servers that caches audio files from the Raw Audio S3 Bucket to serve them to users with minimal latency.
While this implementation technically facilitates playback, it contains significant architectural "anti-patterns" and missing features (e.g. efficient search, royalty payments, ML recommendations) that are preventing us from fulfilling both the functional and non-functional requirements.
Deep Dive 1: Multiple Bitrates & Data Synchronization
Deep Dive 2: ML Recommendation Pipeline
Deep Dive 3: Royalty Payments
Additional Discussion Points
Master System Design Interviews
Get ready for the exact system design questions top tech companies are asking right now. Read comprehensive editorial write-ups and practice with our AI whiteboard that simulates a real, step-by-step interviewer experience.
See All System Designs →