Data Council Blog

Data Council Blog

To Shard or Not to Shard (PostgreSQL)

 psql-shard-300.png

Wouldn't the world be a simpler place if we could easily scale our RDBMS? (gasp!)

What do you do when you find yourself in a situation where you need to scale out your RDBMS to support greater data volumes than you originally anticipated? Traditionally, one would either need to vertically scale their infrastructure by putting their database on more powerful (costlier) machines or sharding their data across multiple workers.

Over the past 10 or so years, another solution arose - migrating your data to a NoSQL database and rewriting your application to support denormalized data models and achieve horizontal scalability by default.

Re-writing your application obviously comes at a significant cost. Moreover, there are are many other costs of switching your database. Two additional costs immediately come to mind: 1) training/re-hiring costs of ramping-up your development and ops teams on a new technology and 2) each NoSQL database has its own query language causing additional debugging and ops headaches in the future.

What if there were a way for enterprises to get the value of horizontal scaling that NoSQL stores allow while retaining the familiarity of a RDBMS?

Ozgun Erdogan, CTO of Citus Data, and his team are tackling this exact problem.


Meet Ozgun Erdogan of Citus Data

ozgun-bw-round.png

Ozgun is a co-founder and the CTO at Citus Data. Prior to Citus, Ozgun worked as a software developer in the Distributed Systems Engineering team at Amazon. Ozgun earned his M.S. in Computer Science from Stanford University, and his B.S. from Galatasaray University. He also holds patents on distributed cache consistency and load balancing.

 


Ozgun started his career as a software engineer at Amazon. During his time there he worked on distributed systems, implementing algorithms on distributed caching and consistency that extended his research from his Master’s program at Stanford. After implementing some prototypes at Amazon in order to achieve a better model for horizontally scaling an RDBMS, Citus Data was born.

Citus Data is one of several companies that has recognized the incredible flexibility of the open-source PostgreSQL database. In fact, when we asked Ozgun what surprised him most about his work at Citus Data he said, “I was surprised by the amount of functionality built into RDBMS’s.”

Typically, achieving horizontal scalability with an RDBMS means sharding data across the database. But, in order to achieve this, a data engineer needs to carefully develop a sharding strategy - often more of an art than a science. Unfortunately, these strategies are also very specific to their applications, preventing knowledge transfer of learnings between teams and organizations.

However, in his talk at our upcoming DataEngConf NYC, Ozgun will share some of the ‘magic’ they built into their custom PostgreSQL extension that replaces the default query planner and makes sharding much easier. Once a data engineer identifies the sharding key, Citus takes care of the lower level query distribution and optimization.

I'm also happy to report that Citus does a lot more than just sharding! To hear more, don't miss Ozgun's talk, The Challenges of Distributing Postgres. Ozgun will share with us the some of the distributed systems challenges they faced at Citus in scaling out Postgres and how they addressed them.

Perhaps you'll discover that there's still hope for your old PostgreSQL implementation - and you can ditch some of those other NoSQL systems you've been wrangling for the past decade. 

 

New Call-to-action

Data Engineering, Event Updates, Databases, sharding, nosql, postgresql

Pete Soderling

Written by Pete Soderling

Pete Soderling is the founder of Data Council & Data Community Fund. He helps engineers start companies.