Big Data companies (e.g. LinkedIn, Facebook, Google, and Twitter) have historically built custom data pipelines over bare metal in custom-designed data centers. While this affords greater control over performance and cost, it often creates a division between operations and development that leads to decreased agility and velocity. Operations see their clients (developers) as internal and hence often don’t invest in self-service tooling, preferring archaic human-backed ticketing systems with human-scale turnaround SLAs (e.g. from hours to days to months to get new machines). The public cloud is a game changer because all users of infrastructure are deemed to be external - hence, all resources from Kinesis or PubSub streams to S3 or Cloud Storage buckets to DynamoDB or BigTable tables can be requested and provisioned on the fly!
Thanks to recent cloud improvements, data infrastructure (i.e. databases, data pipelines, search engines, blob stores) can now not only be provisioned on the fly but also autoscaled and auto-healed without the developer being aware. Autoscaling of EC2, introduced ~8 years ago, has recently been replaced by serverless (e.g. AWS Lambda) & fully-hosted ( e.g. AWS Elasticache, ElasticSearch, DynamoDB ) approaches. Provisioning automation such as Chef, Puppet, and Ansible, is increasingly being obsoleted by Terraform. Developers of Data Pipelines, Predictive and ETL, can focus more on the differentiated aspects of their work, leaving the management of data infrastructure to AWS for the most part.
Agari, a leading email security company, is applying big & fast data best practices to both the security industry and to the cloud in order to secure the world against email-bourne threats. We do this by building near-real time stream processing predictive data pipelines & control systems in the AWS cloud that are infinitely scalable, highly available, low latency, and easy to manage. Come to this talk to learn more.
Sid Anand is a hands-on software architect with deep experience building and scaling web sites that millions of people visit every day. He currently serves as the Data Architect for Agari, a rising email security company. Prior to joining Agari, Sid held several technical and leadership positions including LinkedIn’s Search Architect, Netflix’s Cloud Data Architect, Etsy’s VP of Engineering, and several technical roles at eBay. He has over 15 years of experience in building websites that millions of people visit every day. Outside of work, Sid co-chairs QCon SF & London, is an active committer/PPMC member on Apache Airflow, and provides advisory services to startups in the area of Big Data. Sid earned his BS and MS degrees in CS from Cornell University, where he focused on Distributed Systems. When not working, Sid enjoys spending time with his lovely wife and 2 kids.