Thank You NYC '18, See You Next Year!

Hosted at: Columbia University
New York City, USA / November 8-9th

DataEngConf is the first community-powered data-platforms, -science, and -analytics event for software engineers, data scientists, CTOs, and technical founders who want to discover tools & insights to build products.

Hero Image (11).svg
paper-plane.png

Talks that keep you Awake!

Our talks are backed by our 100% No Bullshit Guarantee* and delivered by leading data scientists and engineers from top teams at companies like Facebook, Netflix, Lyft, Salesforce, WeWork, MIT & many more.
search.png

4 Unique Tracks @ Columbia University

Dedicated data science, data engineering, AI Products and our all new hero engineering track coupled with & our brand new Founders Panel of top founders in the data space.

tools.png

Likeminded Attendees

Tired of suits sitting next to you at a "data" conference? We are too, so that's why our attendees are highly-skilled data scientists, engineers, analysts and engineering managers from top companies and startups.
video-player.png

Born to be Open Source

We believe in Open Source; code, content and mentality. We feature top open source contributors and tools at our talks. Afterwards, we publish all content for the community, for free.
testing.png

Community Powered

We are a diverse group of geeks, coders, scientists, analysts (& a former astronaut serving your lunch). We care deeply about local data communities, no big $/€ sponsors here, just data nerds, like you & I.
competition.png

Networking that works

Experience our popular "Office Hours" format plus our favored Data Community Party. Through our extensive networking opportunities you'll meet the people who will positively impact your career.

Talks by speakers at these top companies & others:

Netflix
buzzfeed
MIT
Facebook AI Research
Stitch Fix
Columbia University Data Science institute
Lyft
dia&co

Our Sponsors

Datadog
Beeswax
Stitch Data
Starburst
Segment.io
wework
facebook
datacoral
Columbia University
Capital One
Unravel Data
New York Times
7Park Data
Mode Analytics
Blackstone
Dia&Co
Qubole
criteo labs
NYU Center for Data Science

NYC '18 Speakers Include:

Show More

Show Less

Technical Founders Panel:

 

 

 

NYC '18 Schedule
Location: Alfred Lerner Hall @ Columbia University, NYC, USA

Location: Talks at: Roone Auditorium
Office Hours at: Room 302
8:00 - 9:00AM Registration and Breakfast
9:00 - 9:15AM Welcome, announcements & track host intros (Pete Soderling)
9:15 - 9:50AM Keynote #1: Scalability is Quantifiable: The Universal Scalability Law - Baron Schwartz (Vivid Cortex)
10:00 - 10:40AM

Extract - Tiered Transform - Load (ETTL): A pipeline for a modular, scalable, and observable Internal Analytics platform - Jean-Mathieu Saponaro (Datadog)

Office Hours: Baron Schwartz - Vivid Cortex

10:45 - 11:25AM

Marquez: A Metadata Service for Data Abstraction, Data Lineage, and Event-based Triggers - Willy Lulciuc (WeWork)

Office Hours: Jean-Mathieu Saponaro - Datadog

11:30 - 12:10PM

Oops I did it Again -- Adapting a Pop Music Identifier to Find Syndicated Content in Talk Radio - Allison King (Cortico)

Office Hours: Willy Lulciuc - WeWork

12:15 - 1:15PM Lunch
1:15 - 1:55PM

Building a Modern Machine Learning Platform on Kubernetes - Saurabh Bajaj (Lyft)

Office Hours: Allison King - Cortico

2:00 - 2:40PM

Automating Modeling Pipelines - William Nelson (Intent Media)

Office Hours: Saurabh Bajaj - Lyft

2:45 - 3:15PM

Coffee Break

3:15 - 3:55PM

Presto: Fast SQL-on-Anything - Kamil Bajda-Pawlikowski (Starburst Data)

Office Hours: William Nelson - Intent Media

4:00 - 4:45PM

Fast Data apps with Alpakka Kafka connector and Akka Streams - Sean Glover (Lightbend)

Office Hours: Kamil Bajda-Pawlikowski - Starburst Data

5:00 - 7:00PM DATA COMMUNITY PARTY
Location: Talks at: Roone Cinema (Floor 2)
Office Hours at: Room 467 A
8:00 - 9:00AM Registration and Breakfast
9:00 - 9:15AM Welcome, announcements & track host intros (Pete Soderling)
9:15 - 9:50AM Keynote #1: Scalability is Quantifiable: The Universal Scalability Law - Baron Schwartz (Vivid Cortex)
10:00 - 10:40AM

Active Learning: Why Smart Labeling is the Future of Data Annotation - Jennifer Prendki (Figure-Eight)

Office Hours - Baron Schwartz - Vivid Cortex

10:45 - 11:25AM

Scaling Personalization via Machine-Learned Assortment Optimization - Ethan Rosenthal (Dia&Co)

Office Hours: Jennifer Prendki - Figure-Eight

11:30 - 12:10PM

The Customer as The Unit of Analysis: Models, Metrics and a Multitude of Uses - Brian Bloniarz (Second Measure)

Office Hours: Ethan Rosenthal - Dia&Co

12:15 - 1:15PM Lunch
1:15 - 1:55PM

An Update on Scikit-learn - Andreas Mueller (Columbia University)

Office Hours: Brian Bloniarz - Second Measure

2:00 - 2:40PM

Using Embeddings to Understand the Evolution of Data Science Skill Sets - Maryam Jahanshahi (Tap Recruit)

Office Hours: Andreas Mueller - Columbia University

2:45 - 3:15PM

Coffee Break

3:15 - 3:55PM

Building Data Tools that Work - Benn Stancil (Mode Analytics)

Office Hours: Maryam Jahanshahi - Tap Recruit

4:00 - 4:45PM

Content Based Recommendations: Using Word Embeddings to Automate Related Content Generation at BuzzFeed - Carolyn Huangci (Buzzfeed)

Office Hours: Benn Stancil - Mode Analytics

5:00 - 7:00PM DATA COMMUNITY PARTY
Location: Talks at: Room 555
Office Hours at: Room 568
8:00 - 9:00AM Registration and Breakfast
9:00 - 9:15AM Welcome, announcements & track host intros (Pete Soderling)
9:15 - 9:50AM Keynote #1: Scalability is Quantifiable: The Universal Scalability Law - Baron Schwartz (Vivid Cortex)
10:00 - 10:40AM

Building a Research Platform using AI - Aditya Jami (Meltwater)

Office Hours: Baron Schwartz - Vivid Cortex

10:45 - 11:25AM

AI Challenges in Customer Care Automation - Sameer Yami (Linc Global)

Office Hours: Aditya Jami - Meltwater

11:30 - 12:10PM

PyTorch 1.0 - The Platform for Accelerating AI Research to Production - Jeff Smith (Facebook AI Research)

Office Hours: Sameer Yami - Linc Global

12:15 - 1:15PM Lunch
1:15 - 1:55PM

Running effective Machine Learning teams: common issues, challenges and solutions. - Gideon Mendels (Comet.ml)

Office Hours: Jeff Smith - Facebook

2:00 - 2:40PM

Optimizing Time to Data through Streams and Data Abstraction - Nicolas Joseph (Datalogue)

Office Hours: Gideon Mendels - Comet.ml

2:45 - 3:15PM

Coffee Break

3:15 - 3:55PM

Computer Vision AI to Disrupt Digital Advertising - Joy Tang (Markable AI)

Office Hours: Nicolas Joseph - Datalogue

4:00 - 4:45PM

Technical Founders Panel (Falcon, Azari, Kucukelbir & Soderling)

Office Hours: Joy Tang - Markable AI

5:00 - 7:00PM DATA COMMUNITY PARTY
Location: Talks at: Roone Auditorium
Office Hours at: Room 302
8:00 - 9:00AM Registration and Breakfast
9:00 - 9:15AM Welcome, announcements & track host intros (Pete Soderling)
9:15 - 9:50AM Keynote #2: Artwork Personalization at Netflix - Tony Jebara (Netflix)
10:00 - 10:40AM

Data Pipeline Frameworks: The Dream and the Reality - Mark Weiss (Beeswax)

Office Hours: Tony Jebara - Netflix

10:45 - 11:25AM

Analyzing Data in the Cloud: Is True Privacy and Security Possible? - Raghu Murthy (Datacoral)

Office Hours: Mark Weiss - Beeswax

11:30 - 12:10PM

Fixing the Big Data Development Cycle with SQL - Justin Coffey (Criteo Labs)

Office Hours: Raghu Murthy - Datacoral

12:15 - 1:15PM Lunch
1:15 - 1:55PM

Stream Processing Design Patterns - Andreas Markmann (Capital One)

Office Hours: Justin Coffey - Criteo Labs

2:00 - 2:40PM

Evolving Stitch Fix's Data Platform for Data Lineage - Neelesh Salian (Stitch Fix)

Office Hours: Andreas Markmann - Capital One

2:45 - 3:15PM

Coffee Break

3:15 - 3:55PM

Building a Music Analytics Pipeline at Pandora - Brian Femiano (Pandora)

Office Hours: Neelesh Salian - Stitch Fix

4:00 - 4:45PM

Closing Keynote: The Literate Programmer: Cargo Cult Open Source - Wes Chow (MIT Media Lab)

Office Hours: Brian Femiano - Pandora

5:00 Conference END :(
Location: Talks at: Roone Cinema (Floor 2)
Office Hours at: Room 467 A
8:00 - 9:00AM Registration and Breakfast
9:00 - 9:15AM Welcome, announcements & track host intros (Pete Soderling)
9:15 - 9:50AM Keynote #2: Artwork Personalization at Netflix - Tony Jebara (Netflix)
10:00 - 10:40AM

Causal Data Science - Adam Kelleher (Barclays Investment Bank)

Office Hours: Tony Jebara - Netflix

10:45 - 11:25AM

Hindsight Bias: How to Deal with Label Leakage at Scale - Till Bergmann (Salesforce)

Office Hours: Adam Kelleher - Barclays Investment Bank

11:30 - 12:10PM

The Difficulty in Choosing Prior in Potentially Explosive Models (Vector Autoregressions, Discrete Choice Models, RNNs) - James Savage (Lendable)

Office Hours: Till Bergmann - Salesforce

12:15 - 1:15PM Lunch
1:15 - 1:55PM

The Unreasonable Deceptiveness of Bad Data - Rigel Swavely (Clarifai)

Office Hours: James Savage - Lendable

2:00 - 2:40PM

Three Tips for Better Predictive Modeling - Stephanie Yang (Foursquare)

Office Hours:  Rigel Swaveley - Clarifai

2:45 - 3:15PM

Coffee Break

3:15 - 3:55PM

Predictive Modeling On Its Head: A Pipeline That Finds Cancer, Asthma and Hemophilia - Marlene Guraieb (Oscar)

Office Hours: Stephanie Yang - Foursquare

4:00 - 4:45PM

Closing Keynote: The Literate Programmer: Cargo Cult Open Source - Wes Chow (MIT Media Lab)

Office Hours: Marlene Guraieb - Oscar

5:00 Conference END :(
Location: Talks at: Room 555
Office Hours at: Room 568
8:00 - 9:00AM Registration and Breakfast
9:00 - 9:15AM Welcome, announcements & track host intros (Pete Soderling)
9:15 - 9:50AM Keynote #2: Artwork Personalization at Netflix - Tony Jebara (Netflix)
10:00 - 10:40AM

The Software Architecture of WayUp's Job Recommender System - Harlan Harris (WayUp)

Office Hours: Tony Jebara - Netflix

10:45 - 11:25AM

AI farming: 100x the yield with a data team of 1 - Sam Swift (Bowery Farming)

Office Hours: Harlan Harris - WayUp

11:30 - 12:10PM

Scale Processes, Not People: How Data Teams Do More With Less By Adopting Software Engineering Best Practices - Thomas La Piana (GitLab)

Office Hours: Sam Swift - Bowery Farming

12:15 - 1:15PM Lunch
1:15 - 1:55PM

The Highs and Lows of Building an Adtech Data Pipeline - Dan Goldin (TripleLift)

Office Hours: Thomas La Piana - GitLab

2:00 - 2:40PM

Accelerating Single-cell Bioinformatics with N-dimensional Arrays in the Cloud - Ryan Williams (Mt. Sinai)

Office Hours: Dan Goldin - TripleLift

2:45 - 3:15PM

Coffee Break

3:15 - 3:55PM

Engineering Lessons Learned by Data Scientists in Growing MalwareScore from Kaggle Competition to Trusted Antivirus Solution - Phil Roth (Endgame)

Office Hours: Ryan Williams - Mt. Sinai

4:00 - 4:45PM

Closing Keynote: The Literate Programmer: Cargo Cult Open Source - Wes Chow (MIT Media Lab)

Office Hours : Phil Roth - Endgame

5:00 Conference END :(

 

 

Trending From Our Blog

 

How Dremio Uses Apache Arrow to Increase the Performance

What if all the best open-source data platforms could easily share, ("ahem,") data with each other?

To Shard or Not to Shard (PostgreSQL)

Wouldn't the world be a simpler place if we could easily scale our RDBMS? (gasp!)

A Day in the Life: What's it like Being an Engineer at Stripe?

Alyssa Frazee tells us about the unicorn data skills she's honed on the job.

Functional Data Engineering — a modern paradigm for batch data processing

In this post, we’ll explore how applying the functional programming paradigm to data engineering can bring a lot of clarity to the process. 

Rebuilding Open Source Analytics @ Airbnb

How open source allowed Airbnb to rebuild their expensive BI tool in less than one developer year. 

Fighting Fraud in Cryptocurrency using Machine Learning

Coinbase is on the front-lines of discovering advanced cryptocurrency and payment fraud techniques. Hear about how they use machine learning to help them fight the war.

 

Pricing

Location: Alfred Lerner Hall @ Columbia University, New York City, United States

 

  Early Bird / The Ticket
SOLD OUT
Regular / The Ticket
SOLD OUT
Last Minute / Late Bird
$899
Access to the Entire Event + All 4 Tracks
   
Free Breakfast, Lunch, Snacks & Drinks
   
Access to Breakout Sessions    

Access to Speaker Office Hours

   

Free Event After-Party

   

Early Access to Conference Recordings

   

No Bullshit Guarantee* (See below)

   
   
DataEngConf is a data engineering event right in the sweet spot. Large enough to attract reputable speakers, knowledgable engineers and growing companies. Small enough to keep the event productive, intimate and community driven.

Andrew Staller

Timescale

Andrew Staller
DataEngConf is a great conference for technical talks with real insights to data engineering challenges and solutions. As an engineer who works on open-source software I found it helpful to hear first hand from users.

Maximilian Michels

Google, PMC member of Apache Flink and Apache Beam

testimonial_headshot.jpg

DataEngConf was a wonderful way [...] to converse with engineers, project managers, and analytics professionals from a variety of tech-driven companies throughout Europe.

 

Waleed Murad

Segment

Waleed Murad

Why Community Matters

You might know us as Hakka Labs, the community platform behind DataEngConf. We're engineers ourselves, and are committed to building the most useful network to connect engineers globally around deeply technical events, training programs and career opportunities.

That's why DataEngConf has been re-designed to be uniquely different - we start by connecting the local data community with our global network, then layer on vetted speakers plus amazing attendees (like you!).

We believe everyone should be able to attend, regardless if you speak Klingon, collect shoes, sport dolphin hats, have freckles, wear glasses (even if you don't have to), or simply prefer to speak in code.

To learn more about the steps we take to include communities and protect diversity at our events, check out our Diversity Statement.

 

Remember, this is not the watered-down “big data” conference you may have attended previously - instead DataEngConf is built and run by software engineers and data geeks, like yourself!

We have teams from across Europe, the US and Asia participating every year along with many cool organizations (you may have heard of them: Facebook, Lyft, Netflix, Apache Foundation, Datadog, BuzzFeed etc.) and lots of 
smart speakers.

Our event talks are hand-picked by our community. Expect real-world architectures of data pipelines and platforms, applied, practical examples of data science, and cutting edge AI implementations from the best startups we discover.

See you at DataEngConf in New York!

Pete Soderling

Need Manager Approval?

Want to get your company to pay for your DataEngConf ticket? Just fill out this form and we'll send them an email.
We will show them the benefits of sending you to the event (we will of course BCC you on the email) - See example

Start Now

By submitting this form we will send your manager a one-time email, additionally you will also automatically get updates about future events, competitions, offers and community news from DataEngConf and Hakka Labs. We will never sell your data and you can unsubscribe at any time, see our privacy policy for details

Close

Location & Venue

(Alfred Lerner Hall @ Columbia University)

lenner-hall-columbia-university

Details on our *No Bullshit Guarantee

This might just be our Marketing Team daydreaming, but we actually, truly, honestly, (okay you get it) believe in authentic and deeply technical community events and we want you to enjoy them as much as we do. We want you to learn, grow, build, engage, launch, laugh and maybe cry (depending on what kind of repositories are in your GitHub) but most of all we want you to feel good about attending DataEngConf.

We want you to feel good about buying a ticket for DataEngConf so that's why we offer a full refund up until 30 days prior to the event. (After that we offer to transfer your ticket to a colleague or a future DataEngConf event). Additionally we unfortunately cannot retroactively apply any special rates, coupons or discounts once the ticket has been purchased.

We are good at detecting bullshit, but in case we missed it, just let us know at
community@hakkalabs.co and we will make it right!