News Feed (Twitter, X)

Blog / News Feed (Twitter, X)

Overview

Requirements
Estimates
Data Model
API Design
Feed Publishing
Architecture Overview
Additional Discussion Points

Requirements

Functional Requirements
- Feed publishing
- Feed retrieval
- Notification, Analytics service etc.
Non Functional Requirements
- High availability
- Minimal latency
Not covered
- Machine learning generated newsfeed (assuming reverse chronological newsfeed)

Estimates

20 million DAU (daily active users)
Average 5 tweets per day
5 * 20 million = 100 million tweets / day
100 million / ( 60 secs * 60 mins * 24 hrs ) ≈ 1,000 tweets / second
Assume 100 : 1 read to write ratio: ≈ 100,000 read requests / second
100 million * 100 bytes ≈ 10 GB / day
10 GB * 365 days * 10 years = 36.5 TB

Data Model

This is a basic outline of some of the core tables that could be included in a newsfeed data model.

users
- Contains information related to the user.
tweets
- user_id: Foreign key which is used to identify which user created the tweet.
- type: Enum which used to determine which type of tweet it is (retweet etc.)
- content: Actual content of the tweet (only text allowed in this model).
followers
- follower_id & followee_id: Two foreign keys which enable the system to know which users are following each other
feeds
- user_id: Foreign key which is used to identify which user the feed belongs.
- Note: users can have several feeds (e.g. Instagram, home and explore feeds)
feeds_tweets
- feed_id: Foreign key which is used to identify which feed the tweet belongs to.
- tweet_id: Foreign key which is used to identify the specific tweet.

API Design

Given the interrelated nature of the data, a GraphQL API could be a good solution for this system. More specifically a GraphQL API would allow the client to determine which fields to fetch and prevent the over fetching of data which could lead to slower response times and an overall worse UX.

Here is an example GraphQL query for a feed:

Similarly we could also use a classic RESTful API to interact with the data. RESTful APIs are simple, widely used, stateless, and support caching which make it a good candidate for our system.

Our REST API will comprise of two main endpoints:

POST: /tweet
- Params: content and auth_token
- Status code: 201 created
GET: /news-feed
- Params: auth_token
- Status code: 200 success

Feed Publishing

In our system we are going to use the fanout method to construct the newsfeeds. Fanout is the process of distributing a message or content update to all the subscribers of a particular feed.

Two main fanout strategies include:

Fanout on write (push model)
Fanout on read (pull model)

1. Fanout on write (push model)

In this approach when a new piece of content is published (e.g. tweet), that content is then pushed to all the user's followers' newsfeeds cache, so that a user's newsfeed is precomputed. This makes reads very fast as a user's newsfeed has been precomputed before they make a request.

Pros:
- Reads are faster as the feed is precomputed at write.
- Feed is generated in real time.
Cons:
- Precomputing for inactive users is a waste of resources
- HotKey problem: Generating news feeds for a user with lots of followers.
  - E.g. If a user has millions of followers, updating millions of newsfeeds will be very resource intensive.

2. Fanout on read (pull model)

In this approach, instead of pushing content to a user's newsfeed cache on write, the system waits until a user requests (pulls) for their newsfeed and then computes the newsfeed on read.

Pros:
- Don’t waste resources for inactive users.
- HotKey problem is avoided.
Cons:
- Reads are slower as feeds are generated when a request is made.

Hybrid Approach (recommended approach)

Use a combination of the push and pull models.
Use the push model for the majority of users.
- i.e. when most people post the news feed caches of their followers are updated.
Use the pull model for celebrities (people with large followings).
- To avoid overloading the system, force each user to get the latest posts from celebrities they follow on read.

Architecture Overview

User service
- Handles user related functionality including following other users etc.
- Given the interconnected nature of the user data using a graph database like Neo4j would be a good choice.
Newsfeed service
- Handles the publishing and retrieval of newsfeeds.
Tweet service
- Handles all functionality related to tweets including posting and favoriting.
- Tweet messages can also be pushed to Kafka which can then be ingested by a notifications service which can send notifications to users via Firebase Cloud Messenger (android), and Apple Push Notifications (iOS).
Notification service
- Handles sending push notifications to users.
Analytics service
- Tracks usage metrics which can be used to analyse the user behaviour and system performance.

Additional Discussion Points

Keep services stateless.
Horizontally scale each service to prevent single points of failure.
Spreading services across multiple data centres.
Many database read replicas to handle large read load.
Cache as much data as possible.
Monitor usage metrics in order to predict peak queries per second (QPS).
Media content and storing that media in a CDN.

Master System Design Interviews

Get ready for the exact system design questions top tech companies are asking right now. Read comprehensive editorial write-ups and practice with our AI whiteboard that simulates a real, step-by-step interviewer experience.

See All System Designs →