Data Council Blog

Data Council Blog

PipelineAI - Featured Startup SF '18

In this blog series leading up to our SF18 conference, we invite our featured startups to tell us more about their data engineering challenges. Today, we speak with PipelineAI, a startup helping you to continuously train, optimize and host deep learning models at scale.

 Q:  What surprised you most as an engineer about the work you did that you'll be telling us about in your talk? 

 

Chris Fregly: As a production-focused engineer, I'm surprised at how little attention is paid to production-izing ML and AI models.  This focus on high-performance model serving is a key difference between Scikit-Learn and TensorFlow, for example.  


Q: What do you think a listener will get out of this this talk vs. other talks on distributed data processing and data versioning that they've previously heard?

Chris Fregly: The talk will highlight our recent R&D at PipelineAI.  We’re laser-focused on ML/AI model prediction performance including model transformations (ie. weight quantization) and runtime optimizations (ie. CPU vs. GPU).  We’ll show how decisions made during the model-training phase (ie. hyper-parameters like learning rate and decision tree depth) can affect performance and accuracy during the model-prediction phase in live production.

 

Q: Any additional background you would like our audience to know?

 

Chris Fregly: In essence, we’ve kept seeing the same problems over and over since 2010.  We realized that nobody is focused on “post-training” phases of the ML/AI pipeline.  By focusing on these phases, we’re surfacing a lot of optimizations that were previously left on the table.  Even small optimizations can lead to enormous performance and cost  benefits due to the scale of enterprise predictions (1,000,000’s of predictions per second) relative to training (100’s of training jobs per day).

 

New Call-to-action

 

About the Startups Track

The data-oriented Startups Track at DataEngConf features dozen of startups forging ahead with innovative approaches to data and new data technologies. We find the most interesting startups at the intersection of ML, AI, data infrastructure and new applications of data science and highlight them in technical talks from their CTOs and their lead engineers building their platforms. 

Data Engineering, Data Warehouse, Data Strategy

Robert Winslow

Written by Robert Winslow

Robert is a seasoned software consultant with a decade of experience shipping great products. He thrives in early-stage startup environments, and works primarily in Go, Python, and Rust. He has led backend development at companies like RankScience and Spot.com; created a rigorous, open-source time-series benchmarking suite for InfluxData; and rapidly prototyped software in a skunkworks-type product lab. He’s taught graduate statistics at GalvanizeU and mentored at the Stanford d.school. He helps maintain Google’s FlatBuffers project, one of the world’s fastest serialization libraries. A colleague once described him as “the developer equivalent of ‘The Wolf’ from Pulp Fiction."