Social Media Platform (Instagram)

Blog / Social Media Platform (Instagram)
Blog hero image

Introduction

Instagram is one of the most popular social media sites on the planet and unsurprisingly is often asked as a system design interview question in Meta.

The first thing to note is that there are many features on Instagram so its important to ask clarifying question to your interviewer to understand what exactly is is that they want. The video is going to focus on the main flows in app but the system is designed to very extendable so that adding more features is very easy.

Requirements

  • Functional Requirements
    • Upload media (image & video)
    • Follow / unfollow users
    • User feed generation
    • Search posts by caption / hashtag
  • Non Functional Requirements
    • High scalability
    • High availability (willing to accept eventual consistency)
    • High durability
    • Low latency
  • Not covered
    • Messaging
    • Image editing

Estimates

  • Storage
    • 500 million DAU
    • Every user posts once every 5 days
    • Posts per day: (500 x 10⁶) * 0.2 = 100 x 10⁶ posts per day
    • Daily Storage: (100 x 10⁶) [posts] * 1MB [average post size] = 100 x 10⁶ MB / 1024 ≈ 97,656 GB
    • Total storage per year: 97,656 GB * 365 ≈ 35,635,440 GB / 1024 ≈ 34,800 TB / 1024 ≈ 34 PB / year
  • Queries Per Second (QPS)
    • Total posts per day: 100 * 10⁶
    • Seconds per day: 24 hours * 60 minutes * 60 seconds = 86,400
    • Writes per second: (100 * 10⁶) / 86,400 ≈ 1,150 writes / second
    • Read write ratio: 100:1
    • Reads per second: 100 * 1,150 ≈ 115,000 reads / second
    • Queries per second: 115,000 + 1,150 ≈ 116,150 QPS

Data Model

This is a basic outline of some of the core tables that could be included in an Instagram data model.

  • users
    • Contains information related to the user.
  • followers
    • follower_id & followed_by_id: Two foreign keys which enable the system to know which users are following each other
  • media
    • media_id: Uniquely identifies each media item.
    • user_id: Links each media item to a user, indicating who uploaded it.
    • media_type, file_url: Describes the media type (e.g., image, video) and its file location.
  • posts
    • post_id: Uniquely identifies each post.
    • user_id: Links each post to a user, indicating who created it.
    • caption: Stores the text or caption of the post.
    • created_at: When the post was created
  • post_media
    • post_id, media_id: Link posts to their associated media.

API Design

For Instagram we will use a classic RESTful API to interact with the data. RESTful APIs are simple, widely used, stateless, and support caching which make it a good candidate for our system.

Our REST API will comprise of two main endpoints:

  • POST: /api/upload
    • Params:
      • file: binary of photo or video
      • metadata: user_id, created_at, caption etc.
    • Response code: 201 created
  • GET: /api/posts/{id}
    • Params: post_id
    • Response code: 200 success
  • GET: /api/feed
    • Params: user_id, pagination
    • Response code: 200 success

Uploading Media Flow

  • Client sends a POST request to the API Gateway
    • Request headers: Content-Tye: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW
      • The boundary is a unique string that acts as a delimiter in the body of the request. This helps the server identify where the file data starts and ends.
    • Request body contains the binary file data
  • API Gateway forwards request to Load Balancer
    • API Gateway handles routing, rate limiting, authentication and authorization.
  • Load Balancer routes the request to an instance of the Post Service
    • The use of a Load Balancer ensures that the incoming requests from clients are evenly distributed across instances of the Post Service, which helps in handling high traffic and ensures high availability.
    • The Post service has been horizontally scaled to handle large traffic volumes and to prevent a single point of failure.
  • Post Service uploads the image to Object Storage
    • When a file exceeds a certain size threshold (e.g., 5MB), the system employs a multi-part upload strategy. This approach divides the file into smaller chunks and uploads each chunk individually.
    • This method helps to manage large file uploads more efficiently, reduces the risk of upload failures, and allows for the resumption of uploads if an interruption occurs.
  • Object Storage triggers a CDN cache update or invalidation
    • Integration with a CDN like CloudFront improves the delivery speed of images to users by caching the images closer to the users geographically.
  • Post Service receives the image URL from Object Storage
  • Post Service uploads metadata to main Postgres storage
    • To ensure data consistency, the metadata storage process is tightly integrated with the image upload workflow.
    • By employing atomic transactions, we guarantee that both the image upload and the metadata storage either complete successfully or fail together, preventing any inconsistencies.
  • Post Service returns the image URL to the client
  • Post Service sends a denormalized message to Kafka
    • Using Kafka for asynchronous message processing to update various services allows decoupling of services and enhances overall system responsiveness and scalability.
    • Ensure that Kafka is configured to handle high volumes of messages efficiently. This includes partitioning topics appropriately and managing consumer groups to optimize throughput.
    • The message is denormalized to provide all necessary data in a single message, reducing the need for downstream services to make additional queries or joins, thereby simplifying processing and improving efficiency.
  • Kafka consumers (Neo4j, Postgres, Indexes, Newsfeed Service) process the message asynchronously
    • Neo4j: Updates the graph database with new post relationships and enhances social network analysis and relationship-based queries.
    • Indexes: Updates the search indexes with new post content which enables efficient and quick searching and retrieval of posts. Indexes could be imlpemented using Elasticsearch is a distributed, RESTful search and analytics engine built on top of the open-source Apache Lucene library.
    • User Feed Service: Updates user feeds with the new post information (detailed flow outlined below)

Pre-Generating User Feed Flow

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo.
Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet.

Requesting User Feed Flow

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo.
Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet.

Additional Discussion Points

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo.
Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet.

Master System Design Interviews

Get ready for the exact system design questions top tech companies are asking right now. Read comprehensive editorial write-ups and practice with our AI whiteboard that simulates a real, step-by-step interviewer experience.

See All System Designs