- Overview
- Step 1: Understand the problem and establish design scope
- Step 2: Propose high-level design and get buy-in
- Step 3: Design deep dive
Facebook: "News feed is the constantly updating list of stories in the middle of your home page. News Feed includes status updates, photos, videos, links, app activity, and likes from people, pages, and groups that you follow on Facebook."
Candidate: Is this a mobile app? Or a web app? Or both? Interviewer: Both.
Candidate: What are the important features? Interviewer: A user can publish a post and see her friends' posts on the news feed page.
Candidate: Is the news feed sorted by reverse chronological order or any particular order such as topic scores? For instance, posts from your close friends have higher scores. Interviewer: To keep things simple, let us assume the feed is sorted by reverse chronological order.
Candidate: How many friends can a user have? Interviewer: 5000
Candidate: What is the traffic volume? Interview: 10 million DAU
Candidate: Can feed contain images, videos, or just text? Interviewer: It can contain media files, including both images and videos.
The design is divided into two flos: feed publishing and news feed building.
-
Feed publishing: When a user publishes a post corresponding data is written into cache and database. A post is populated to his friends' new feed.
-
Newsfeed building: We assume the news feed is built by aggregating friends' posts in reverse chronological order.
The news feed APIs are the primary ways for clients to communicate with servers. Those APIs are HTTP based and allow clients to perform actions, which include posting a status, retrieving news feed, adding friends, etc.
To publish a post, a HTTP POST request will be sent to the server: POST /v1/me/feed
.
Params:
content
: Text of the post.auth_token
: Used to authenticate API requests.
The API to retrieve news feed is GET /v1/me/feed
and receives as param auth_token
for to authenticate API requests.
-
User: A user can view news feeds on a browser or mobile app. A user makes a post with content "Hello" through API:
POST /v1/me/feed?content=Hello&auth_token={auth_token}
. -
Load balancer: Distribute traffic to web servers.
-
Web servers: Web servers redirect traffic to different internal services.
-
Post service: Persist post in the database and cache.
-
Fanout service: Push new content to friends' news feed. Newsfeed data is stored in the cache for fast retrieval.
-
Notification service: Inform friends that new content is available and send out push notifications.
-
User: A user sends a request to retrieve his news feed:
GET /v1/me/feed
. -
Newsfeed service: Fetches news feed from the cache.
-
Newsfeed cache: Store news feed IDs needed to render the news feed.
We will focus on two components: web servers and fanout service.
Besides communicating with clients, web servers enforce authentication and rate-limiting. Only users signed in with valid auth_token are allowed to make posts. The system limits the number of posts a user can make within a certain period, vital to prevent spam and abusive content.
Fanout is the process of delivering a post to all friends. Two types of fanout models are: fanout on write (also called push model) and fanout on read (also called pull model). Both models have pros and cons.
-
Fanout on write: News feed is pre-computed during write time. A new post is delivered to friends' cache immediately after it is published.
- (+) News feed is generated in real-time and can be pushed to friends immediately.
- (+) Fetching news feed is fast because the news feed is pre-computed during write time.
- (-) If a user has many friends, fetching the friend list and generating news feeds for all of them are slow and time consuming. It is called hotkey problem.
- (-) For inactive users or those rarely log in, pre-computing news feeds waste computing resources.
-
Fanout on read: The news feed is generated during read time. This is an on-demand model. Recent posts are pulled when a user loads her home page.
- (+) For inactive users or those who rarely log in, fanout on read works better because it will not waste computing resources on them.
- (+) Data is not pushed to friends so there is no hotkey problem.
- (-) Fetching the news feed is slow as the news feed is not pre-computed.
We adopt a hybrid approach: Since fetching the news feed fast is crucial, we use a push model for the majority of users. For celebrities or users who have many friends/followers, we let followers pull news content on-demand to avoid system overload. Consistent hashing is a useful technique to mitigate the hoyket problem as it helps to distribute requests/data more evenly.
The fanout service works as follows:
-
Fetch friend IDs from the graph database. Graph databases are suited for managing friend relationship and friend recommendations.
-
Get friends info from the user cache. The system then filters out friends based on user settings. For example, if you mute someone, his posts will not show up on your news feed even though you are still friends. Also, a user could selectively share information with specific friends or hide it from other people.
-
Send friends list and new post ID to the message queue.
-
Fanout workers fetch data from the message queue and store news feed data in the news feed cache (
<post_id, user_id>
mapping table). To keep the memory size small, we set a configurable limit. The chance of a user scrolling through thousands of posts in news feed is slim, so the cache miss rate is low. -
Write
<post_id, user_id>
to cache.
Media content are stored in CDN for fast retrieval.
-
A user sends a request to retrieve his news feed. The request looks like
GET /v1/me/feed
. -
The load balancer redistributes requests to web servers.
-
Web servers call the news feed service to fetch news feed.
-
News feed service gets a list of post IDs from the news feed cache.
-
A user's news feed also contains username, profile picture, post content, post image, etc. Thus, the news feed service fetches the complete user and post objects from caches (user cache and post cache) to construct the fully hydrated news feed.
-
The fully hydrated news feed is returned in JSON format back to the client for rendering.
Cache is extremely important for a news feed system. We device the cache tier into 5 layers:
- News feed: Stores IDs of news feed.
- Content: Every post data. Popular content is stored in hot cache.
- Social Graph: User relationship data.
- Action: Info about whether a user liked a post, replied a post, or took other actions on a post.
- Counters: Counters for like, reply, followers, following, etc.
To avoid duplicated discussion, only high-level talking points are listed below:
- Scaling the database:
- Vertical scaling vs Horizontal scaling
- SQL vs NoSQL
- Master-slave replication
- Read replicas
- Consistency models
- Database sharding
- Keep web tier stateless
- Cache data as much as you can
- Support multiple data centers
- Lose couple components with message queues
- Monitor key metrics. For instance, QPS during peak hours and latency while users refreshing their news feed are interesting to monitor.