Facebook’s User DB — Is it SQL or NoSQL?
Facebook’s User DB — Is it SQL or NoSQL?
Updated March 2019.
Ever wondered which database Facebook (FB) uses to store the profiles of its 2.3B+ users? Is it SQL or NoSQL? How has FB database architecture evolved over the last 15+ years? As an engineer in FB database infrastructure team from 2007 to 2013, I had a front row seat in witnessing this evolution. There are invaluable lessons to be learned by better understanding the database evolution at the world’s largest social network, even though most of us won’t be facing exactly the same challenges in the near future. This is because the fundamental tenets that underpin FB’s internet-scale, globally-distributed architecture today apply to many business-critical enterprise apps such as multi-tenant SaaS, retail product catalog/checkout, travel bookings and gaming leaderboards.
Initial Architecture
As any FB user can easily understand, his/her profile is not simply a list of attributes such as name, email, interests and so on. It is in fact a rich social graph that stores all friend/family relationships, groups, check-ins, likes, shares and more. Given the data modeling flexibility of SQL and the ubiquity of MySQL when FB started, this social graph was initially built as a PHP application powered by MySQL as the persistent database and memcache as a “lookaside” cache.

Facebook’s original database architecture
In the lookaside caching pattern, the application first requests data from the cache instead of the database. If the data is not cached, the application gets the data from the backing database and puts it into the cache for subsequent reads. Note that the PHP application was accessing MySQL and memcache directly without any intermediate data abstraction layer.
Growing Pains
FB’s meteoric success from 2005 onwards put enormous strain on the simplistic database architecture highlighted in the previous section. Following were some of the growing pains FB engineers had to solve in a short period of time.
Loss of Developer Agility
Engineers had to work with two data stores with two very different data models: a large collection of MySQL master-slave pairs for storing data persistently in relational tables, and an equally large collection of memcache servers for storing and serving flat key-value pairs derived (some indirectly) from the results of SQL queries. Working with the database tier now mandated first gaining intricate knowledge of how the two stores worked in conjunction with each other. Net result was loss in developer agility.
Application-Level Database Sharding
Inability of MySQL to scale write requests beyond one node became a killer problem as data volumes grew leaps and bounds. MySQL’s monolithic architecture essentially forced application-level sharding very early on. This meant that the application now tracked which MySQL instance is responsible for storing which user’s profile. The development and operational complexity grows exponentially when the number of such instances grow from 1 to 100s and thereafter explode into 1000s. Note that adherence to such an architecture meant that application no longer uses the database to perform any cross-shard JOINs and transactions, thereby giving up on the full power of SQL (as a flexible query language) in order to scale horizontally.
Multi-Datacenter, Geo-Redundant Replication
Handling datacenter failures also became a critical concern which meant storing MySQL slaves (and corresponding memcache instances) in multiple geo-redundant datacenters. Perfecting and operationalizing failovers was no easy feat in itself but given the master-slave asynchronous replication, recently committed data would still be missing whenever such a failover was undertaken.
Loss of Consistency Between Cache & DB
The memcache in front of a remote-region MySQL slave cannot immediately serve strongly (aka read-after-write) consistent reads because of the asynchronous replication between the master and the slave. And, the resulting stale reads in the remote region can easily lead to confused users. E.g. a friend request can show up as accepted to one friend while showing up as still pending to the other.
Enter TAO, a NoSQL Graph API on Sharded SQL
Early 2009 FB started building TAO, a FB-specific NoSQL graph API built to run on sharded MySQL. The goal was to solve the problems highlighted in the previous section. TAO stands for “The Associations and Objects”. Even though the design for TAO was first published as a paper in 2013, the implementation for TAO was never open sourced given the proprietary nature of the FB social graph.
TAO represented data items as nodes (objects) and relationships between them as edges (associations). The FB application developers loved the API because they could now easily manage database updates and queries necessary for their application logic with no direct knowledge of MySQL or even memcache.
Architecture
As shown in the figure below, TAO essentially converted FB’s existing 1000s of manually-sharded MySQL master-slave pairs to a highly-scalable, auto-sharded, geo-distributed database cluster. All objects and associations in the same shard are stored persistently in the same MySQL instance, and are cached on the same set of servers in each caching cluster. Placement of individual objects and associations can be directed to specific shards at creation time when needed. Controlling the degree of data collocation proved to be an important optimization technique for providing low latency data access.
SQL-based access patterns such as cross-shard ACID transactions and JOINs were disallowed in TAO as a means to preserve such low latency guarantees. However, it supported non-atomic two-shard writes in the context of an association update (whose two objects may be in two different shards). In case of failures after one shard update but before the second shard update, an asynchronous repair job would clean up the “hanging” association at a later time.
Shards can be migrated or cloned to different server in the same cluster to balance the load and to smooth out load spikes. Load spikes were common and happen when a handful of objects or associations become extremely popular as they appear in the News Feeds of tens of millions of users at the same time.

TAO Architecture (Source: FB Engineering Blog)
Comments
Post a Comment