X’s Recommendation System: Architecture of the Grok-Based Algorithm

When users open X (formerly Twitter) and scroll through the “For You” tab, a recommendation system determines which posts to display and in what sequence. This recommendation system operates in real-time. In the realm of social media, low latency is critical, as any performance issues can lead to user dissatisfaction. Previously, the internal mechanisms of this recommendation system remained largely undisclosed. Recently, however, the xAI engineering team open-sourced the algorithm driving this feed, releasing it on GitHub under an Apache-2.0 license. This revelation showcases a system constructed around a Grok-based transformer model, which has largely supplanted hand-crafted rules with advanced machine learning techniques. This article examines the algorithm’s functionality, the interplay of its components, and the rationale behind the xAI Engineering Team’s design decisions.

The Big Picture

When a user requests the For You feed in X, the algorithm retrieves content from two distinct sources.

The initial source consists of in-network content, which includes posts from accounts the user already follows. For instance, if a user follows 200 individuals, the system evaluates recent posts from those accounts as potential candidates for the user’s feed.

The secondary source comprises out-of-network content, which refers to posts from accounts the user does not follow. The algorithm identifies these posts by searching a global repository using a machine learning technique known as similarity search. The underlying principle is that if a user’s past engagement indicates interest in a particular post, it becomes a candidate even if the author is unfamiliar.

These two sets of candidates are subsequently consolidated into a single list, subjected to scoring, filtering, and ranking. The highest-ranked posts are then presented to the user upon opening the application.

The Four Core Components

The diagram below illustrates the overall architecture of the system developed by the xAI engineering team:

image 0 (URL: https://substackcdn.com/image/fetch/$s_!14Oq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F290026e6-5e0b-4a09-b202-45eb0b9a6f3b_2372x4150.png)

The codebase is structured into four primary directories, each corresponding to a unique segment of the system. The complete codebase is implemented in Rust (62.9%) and Python (37.1%).

Home Mixer

Home Mixer functions as the orchestration layer, coordinating the sequential invocation of other components to assemble the final feed. Its role involves managing the pipeline rather than performing intensive machine learning computations.

Upon receiving a request, Home Mixer initiates several stages in sequence:

  • Fetching user context
  • Retrieving candidate posts
  • Enriching those posts with metadata
  • Filtering out ineligible candidates and scoring the remaining ones
  • Selecting the top results and executing final validations.

The server provides a gRPC endpoint, ScoredPostsService, which delivers the ranked list of posts for a specified user.

Thunder

Thunder serves as an in-memory post store and a real-time ingestion pipeline. It processes post creation and deletion events from Kafka, maintaining distinct per-user stores for original posts, replies, reposts, and video posts.

When the algorithm requires in-network candidates, it queries Thunder. This component delivers results in sub-millisecond times due to its in-memory architecture, bypassing the need for an external database. Thunder also autonomously purges posts exceeding a configured retention period, ensuring the dataset remains current.

Phoenix

Phoenix functions as the machine learning core of the system, undertaking two primary responsibilities.

Phoenix employs a two-tower model to identify out-of-network posts. One tower, designated as the User Tower, processes a user’s features and engagement history, encoding them into a mathematical representation known as an embedding. The other tower, the Candidate Tower, encodes each post into its own embedding.

The identification of relevant posts then becomes a similarity search. The system calculates a dot product between the user’s embedding and each candidate embedding, subsequently retrieving the top-K most similar posts. For those unfamiliar with dot products, the fundamental concept is that two embeddings pointing in similar directions within a high-dimensional space yield a high score, indicating the post’s probable relevance to the user.

The diagram below illustrates the concept of embeddings:

image 1 (URL: https://substackcdn.com/image/fetch/$s_!Y2uF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffebf555a-fdee-42d5-96b8-67121e00d08e_3160x1786.png)

Following the retrieval of candidates from both Thunder and Phoenix’s retrieval stage, Phoenix executes a Grok-based transformer model to predict the likelihood of a user engaging with each post.

The diagram below demonstrates the concept of a transformer model:

image 2 (URL: https://substackcdn.com/image/fetch/$s_!BnyU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed0a89bf-489f-4107-acd3-6f06ab73a4b4_2682x1232.png)

The transformer’s implementation is adapted from xAI’s Grok-1 open-source release, tailored for recommendation system use cases. It accepts a user’s engagement history and a batch of candidate posts as input, subsequently outputting a probability for each potential engagement action.

Candidate Pipeline

The Candidate Pipeline constitutes a reusable framework that outlines the architecture of the entire recommendation process.

It offers traits (equivalent to interfaces in Rust terminology) for each distinct stage within the pipeline:

  • Source (responsible for fetching candidates)
  • Hydrator (tasked with enriching candidates with supplementary data)
  • Filter (designed to remove ineligible candidates)
  • Scorer (for computing relevance scores)
  • Selector (for sorting and choosing the top candidates)
  • SideEffect (for executing asynchronous operations such as caching and logging).

The framework is engineered to run independent stages concurrently when feasible and incorporates configurable error handling. This modular architecture simplifies the process for the xAI Engineering Team to integrate new data sources or scoring models without necessitating a rewrite of the core pipeline logic.

The Pipeline Step by Step

The complete sequence executed each time a user accesses the For You feed is as follows:

  • Query Hydration: The system retrieves a user’s recent engagement history (including likes, replies, and reposts) and associated metadata, such as their following list.
  • Candidate Sourcing: Thunder supplies recent posts from accounts the user follows. Phoenix Retrieval contributes machine learning-discovered posts from the global corpus.
  • Candidate Hydration: Each potential post is enriched with supplementary details, including its text and media content, the author’s username and verification status, video duration if applicable, and subscription status.
  • Pre-Scoring Filters: Prior to any scoring, the system eliminates posts that are duplicates, excessively old, self-authored, from blocked or muted accounts, containing muted keywords, previously viewed, or deemed ineligible subscription content.
  • Scoring: The remaining candidates proceed through several scorers sequentially. Initially, the Phoenix Scorer obtains machine learning predictions from the transformer. Subsequently, the Weighted Scorer integrates these predictions into a singular relevance score. An Author Diversity Scorer then decreases the score of posts from frequently featured authors to prevent a single individual from dominating the user’s feed. Finally, an OON (out-of-network) Scorer modifies scores for posts originating from accounts the user does not follow.
  • Selection: Posts are arranged by their final score, and the top K are chosen.
  • Post-Selection Filters: A concluding series of checks removes posts that have been deleted, marked as spam, or identified as containing violent or graphic content. A conversation deduplication filter also prevents the display of multiple branches from the same reply thread.

How Scoring Works

The Phoenix transformer forecasts probabilities for a diverse array of user actions, including liking, replying, reposting, quoting, clicking, visiting the author’s profile, watching a video, expanding a photo, sharing, dwelling (spending time reading), following the author, marking “not interested,” blocking the author, muting the author, and reporting the post.

Each predicted probability is multiplied by a corresponding weight and then summed to yield a final score. Positive actions, such as liking, reposting, and sharing, are assigned positive weights. Conversely, negative actions like blocking, muting, and reporting carry negative weights. Consequently, if the model predicts a high likelihood of a user blocking a post’s author, that post’s score is substantially reduced. The formula is straightforward:

Final Score = sum of (weight for action * predicted probability of that action)

This multi-action prediction methodology offers greater nuance than a solitary “relevance” score, enabling the system to differentiate between content a user would find enjoyable and content they might perceive as annoying or harmful.

Conclusion

Five key architectural decisions are noteworthy within xAI’s recommendation system.

Firstly, instead of human determination of significant signals (such as post length, hashtag count, or time of day), the Grok-based transformer autonomously learns relevance directly from user engagement sequences. This approach streamlines data pipelines and serving infrastructure.

Secondly, during the transformer’s scoring of a batch of candidate posts, each post is designed to only “attend to” the user’s context, not to other candidates within the same batch. This design ensures that a post’s score remains independent of the other posts present in the batch, thereby promoting consistent and cacheable scores, which is crucial for operations at X’s scale.

Thirdly, both the retrieval and ranking stages incorporate multiple hash functions for efficient embedding lookup.

Fourthly, instead of consolidating all factors into a single relevance metric, the model predicts probabilities for numerous distinct actions. This empowers the Weighted Scorer with granular control over the feed’s optimization objectives.

Finally, the Candidate Pipeline framework meticulously separates the pipeline’s execution logic from the business logic inherent in individual stages. This facilitates the xAI Engineering Team in integrating new data sources, substituting different scoring models, or introducing new filters without altering the broader system.

References:

X For You Feed Algorithm

X For You Feed Algorithm (URL: https://github.com/xai-org/x-algorithm)

Transformers Architecture

Transformers Architecture (URL: https://en.wikipedia.org/wiki/Transformer_(deep_learning))

Introduction to Grok

Introduction to Grok (URL: https://en.wikipedia.org/wiki/Grok_(chatbot))