Set theory forms the basis for relational algebra and relational databases, and SQL is the lingua franca of modern RDBMS’s. Even with all the attention given to NoSQL in recent years, the lion share of database usage remains relational. But until recently, nearly all relational database solutions have been limited to the resources of a single node. Not anymore.
This talk is about my team’s journey tackling the challenges of distributing SQL. Specifically in the context of my favorite (open source) database: Postgres. I believe that too many developers spend too much time worrying about scaling their databases. So at Citus Data, we created an extension to Postgres that enables developers to scale out compute, memory, and storage by distributing queries across a cluster of nodes.
This talk describes the distributed systems challenges we faced at Citus in scaling out Postgres—and how we addressed them. I’ll talk about how we use PostgreSQL’s extension APIs to parallelize queries in a distributed cluster. I’ll cover the architecture of a distributed query planner and specifically how the join order planner has to choose between broadcast, co-located, and repartition joins in order to minimize network I/O. And if there’s time, I’ll walk through the dynamic executor logic that we built. The end result: a distributed database and a lot less time spent worrying about scale.
Ozgun is a co-founder and the CTO at Citus Data. Prior to Citus, Ozgun worked as a software developer in the Distributed Systems Engineering team at Amazon. There, he proposed, designed, and implemented novel algorithms on distributed caching and consistency; and also worked on building systems for scalable data analytics. Ozgun earned his M.S. in Computer Science from Stanford University, and his B.S. from Galatasaray University. He also holds patents on distributed cache consistency and load balancing.