<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=420156728350215&amp;ev=PageView&amp;noscript=1">

SF '17 Schedule

To view the schedule for each day, click on the tabs below. Speakers and talks being added daily, so be sure to check back! 

 

 

WORKSHOP DAY

* Schedule is subject to change by organizer.

 

TIME & TRACKS

 TRACK 1:

SPARK

2.0.1

 TRACK 2:

SPARK

INTERNALS

TRACK 3:

PANDAS

 

TRACK 4:

SCIKIT-

LEARN 

TRACK 5:

APACHE

AIRFLOW

 8:30am - 9:00am

CHECK-IN & COFFEE/TEA BREAK

 9:00am - 12:30pm

 

Workshop:

Intro to 

Spark

Level:

Beginner -

Intermediate

Trainer:

Austin Ouyang

Insight Data Science

 

Workshop:

Spark 

Internals

Level:

Intermediate - 

Advanced

Trainer:

Ronak Nathani

Insight Data Science

Workshop:

Intro to 

Pandas

Level:

Beginner -

Intermediate

Trainer:

Ivan Corneillet

Galvanize

Workshop: 

Intro to 

Scikit-learn

Level:

Beginner -

Intermediate

Trainer:

Francesco Mosconi

DataWeekends

Workshop:

Up & Running

with Airflow

Level:

Beginner -

Intermediate

Trainer:

Arthur Wiedmer

Airbnb

12:30pm - 1:30pm  LUNCH BREAK
1:30pm - 5:00pm  

 

Workshop:

 Using

Spark APIs

Level:

Intermediate -

Advanced

Trainer:

Austin Ouyang

Insight Data Science

 

Workshop:

 Spark Streaming &

Kafka

Level:

Intermediate -

Advanced

Trainer:

Ronak Nathani

Insight Data Science

 

Workshop:

 Intermediate

Pandas

Level:

Intermediate -

Advanced

Trainer:

Ivan Corneillet

Galvanize

 

Workshop:

Intermediate

Scikit-learn

Level:

Intermediate -

Advanced

Trainer:

Francesco Mosconi

DataWeekends

 

Workshop:

 Airflow

Use Cases

Level:

Intermediate -

Advanced

Trainer:

Arthur Wiedmer

Airbnb

 

CONFERENCE DAY 1 & AFTER- PARTY

* Schedule is subject to change by organizer.

 

 

DATA ENGINEERING TRACK

DATA SCIENCE TRACK

OFFICE HOURS:

DATA ENGINEERING

OFFICE HOURS:

DATA SCIENCE

8:00 - 9:00am REGISTRATION & BREAKFAST
9:00 - 9:15am WELCOME, ANNOUNCEMENTS & TRACK HOST INTROS
9:15 - 9:50am

[OPENING KEYNOTE]

Mohammad Shahangian

Pinterest

Data Science @ Pinterest

10:00 - 10:40am

Sid Anand

Agari

Cloud Native Data Pipelines

Soups Ranjan

Coinbase

Payment Fraud in Digital Currency

Mohammad Shahangian
10:45 - 11:25am

 Maxime Beauchemin

Airbnb

How Superset and Druid Power Real-time Analytics at Airbnb

 Daniel Galron

eBay

Why, When, How: Lessons Learned in Applying Deep Learning to Real-world Problems 

Sid Anand   Soups Ranjan
11:30 - 12:10pm

Chris Hartfield

Clover Health

How Healthcare Data Pushed Us to the Limit

Laura Pruitt

Netflix

Anomaly Detection for Data Quality and Metric Shifts at Netflix 

 Max Beauchemin Daniel Galron 
12:15 - 1:15pm LUNCH BREAK
1:15 - 1:55pm

Holden Karau

IBM

Beyond Spark RDDs: Dataframes & Datasets for Mixing Functional & Relational Code (also vroom vroom)

Sharath Rao

Instacart

Practical Lessons for Building Machine Learning Models in Production 

Chris Hartfield Laura Pruitt
2:00 - 2:40pm

Jeff Chao

Heroku

Beyond 50,000 Partitions: How Heroku Operates and Pushes the Limits of Kafka at Scale

Sean Anderson

Cloudera

Data Science in the Enterprise

Holden Karau Sharath Rao
2:45 - 3:15pm COFFEE BREAK
3:15 - 3:55pm

Paul Dix

InfluxData

InfluxDB Storage Engine Internals 

Jennifer Prendki

@WalmartLabs

The Limitations of Big Data in Predictive Analytics

 Jeff Chao Sean Anderson 
4:00 - 4:55pm

Vinoth Chandar

Uber

Hoodie: An Open Source Incremental Processing Framework From Uber

 [INVESTOR PANEL]

How Engineer-Angels Evaluate Data-Backed Startups

Parker Thompson | Shruti Gandhi | Jocelyn Goldfein | Itamar Novick | Pete Soderling

AngelList | Array Ventures | Zetta Venture Partners Life360 | Hakka Labs

 

Paul Dix  Jennifer Prendki 
5:00 - 5:45pm

 [KEYNOTE PANEL]

The Right Stuff: Lessons Learned from a Decade of Data Engineering

Mike Driscoll | Vijay Gill | Sam Shah | Ben Hamner

Metamarkets | Salesforce | SkipFlag Kaggle

 

Vinoth Chandar

N/A

6:00 - 8:00pm CONFERENCE & COMMUNITY AFTER-PARTY

 

CONFERENCE DAY 2

* Schedule is subject to change by organizer.

 

 

DATA ENGINEERING TRACK

DATA SCIENCE TRACK

OFFICE HOURS:

DATA ENGINEERING

OFFICE HOURS:

DATA SCIENCE 

8:00 - 9:00am REGISTRATIONS & BREAKFAST
 9:00 - 9:15am WELCOME, ANNOUNCEMENTS & TRACK HOST INTROS
 9:15 - 9:50am

[DAY 2 KEYNOTE]

John Myles White

Facebook

Writing Correct Data Analysis Code

10:00 - 10:40am

Fangjin Yang

Imply

Interactive Exploratory Analytics with Druid

Nelson Ray

Opendoor

Simulation-based Inference: Advantages Over A/B Testing in Real Estate

John Myles White  
10:45 - 11:25am

Silvia Oliveros-Torres & Stephen O'Sullivan

Silicon Valley Data Science

Format Wars: From VHS and Beta to Avro and Parquet

Alyssa Frazee

Stripe

Practical Solutions for Annoying Machine Learning Problems

 Fangjin Yang Nelson Ray
11:30 - 12:10pm

Avrilia Floratou & Ashvin Agrawal

Microsoft

Twitter Heron: The Path Towards Elastic Streaming

Liz Bennett

Stitch Fix

An Introduction to Big Data's Unsung Hero: The Log

 

Silvia O. Torres & Stephen Sullivan 

Alyssa Frazee 
12:15 - 1:15pm LUNCH BREAK
1:15 - 1:55pm

Tony Givargis

Levyx

Real-time System Computing Engines

Kenneth Sanford

Dataiku

A Nation of Immigrants: The Data Sciences

Ashvin Agrawal & Avrilia Floratou

 Liz Bennett
2:00 - 2:40pm

Shuojie Wang

Facebook

Scaling Up Spark at Facebook: A 60TB Production Use Case

Dave Deriso

SimpleHealth

Deep Representations of Time Series

 Tony Givargis  Kenneth Sanford
2:45 - 3:15pm COFFEE BREAK
3:15 - 3:55pm

Polong Ling

IBM

Building 21st Century Data Science and Data Engineering Teams

 

Benn Stancil

Mode

Data for the 99%

Shuojie Wang  Dave Deriso
4:00 - 4:55pm

[CLOSING KEYNOTE]

Matei Zaharia

Databricks

Composable Parallel Processing in Apache Spark and Weld

 

Polong Lin

Benn Stancil

5:00pm

CONFERENCE ENDS :(