<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=420156728350215&amp;ev=PageView&amp;noscript=1">

DataEngConf Data Science & Engineering Blog

Pete Soderling

Pete Soderling
Pete is a software engineer, 3x founder and angel investor. As the founder of Hakka Labs and DataEngConf he loves to build community for software engineers and has some bumps and bruises to prove it. Previously, he was the founder of Stratus Security (a cloud-based API platform) and mechanikal (a software development agency in NYC). Pete has spoken across the globe at conferences like RSA Security and O'Reilly Strata, been an organizer of the QCon conference series, and had his moment of fame as a TEDx speaker. He's currently a mentor at 500 Startups in San Francisco, even though he lives in Jackson Hole, WY, where the snow is far better.

Recent Posts

A Day in the Life: What's it like Being a Machine Learning Engineer at Stripe?

By Pete Soderling

Alyssa Frazee tells us about the unicorn data skills she's honed on the job.

One thing that Alyssa Frazee loves about her work at Stripe is that, like someone with traditional data science skills, she gets to build machine learning models. "Oh, the rapture," cries Alyssa the data scientist!
Read More [fa icon="long-arrow-right"]

Rebuilding Open Source Analytics @ Airbnb

By Pete Soderling

How open source allowed Airbnb to rebuild their expensive BI tool in less than one developer year

Read More [fa icon="long-arrow-right"]

Pushing Kafka to the Limit at Heroku

By Pete Soderling

How Everyone's Favorite PaaS Operates Kafka at Scale

Scale presents unique challenges for engineers, particularly those at companies who have the largest number of users throwing off the most data exhaust, resulting in the fattest data pipelines with the gnarliest problems. For example, Heroku, arguably the most popular platform as a service (PaaS), who last year decided to offer Apache Kafka to their customers as a hosted service, quickly realized they would need to support a large number of distinct users, each with varying use cases. This put them on a challenging path to attempt to minimize the operational headaches that come inherently with running this kind of infrastructure at scale.
 
Read More [fa icon="long-arrow-right"]

How to Gain the Community Edge in your Data Recruiting

By Pete Soderling

In my years as an engineer, founder and developer-wrangler, I've learned that data recruiting is crazy hard. I've also learned that the best, and perhaps only way to build meaningful relationships with the greatest engineers is to be present and active in their communities.

Read More [fa icon="long-arrow-right"]

Fighting Fraud in Cryptocurrency using Machine Learning

By Pete Soderling

Coinbase is on the front-lines of discovering advanced cryptocurrency and payment fraud techniques. Hear about how they use machine learning to help them fight the war.

Read More [fa icon="long-arrow-right"]

Building a Column-Oriented, Distributed Data Store for Analytics - The Story of Druid

By Pete Soderling

 

Druid is a modern data store built for analytics use-cases. As the volume of data has exploded, and companies have sought deeper insights from their data, ad-hoc analytics have become difficult as more data is buried in distributed systems like Hadoop & Spark. The query model for these systems can result in long latencies making them sub-optimal for interactive analytics applications.

Read More [fa icon="long-arrow-right"]

How to Build a Data Pipeline That Handles Hundreds of Different Inputs

By Pete Soderling

How many different file formats does your ETL system need to parse? For many data pipelines, several well-defined formats will suffice. Things break, and at times require manual intervention, but not so often that a couple engineers can't keep tabs on the system and keep things running relatively smoothly.

Read More [fa icon="long-arrow-right"]