What is Data Products Bootcamp?
Data Products Bootcamp is a special 2-day bootcamp immediately following DataEngConf BCN. The bootcamp is designed for people building data-driven products: technical product managers, developers and data engineers. The course is a hands-on approach to building a data-driven product. Attendees will use TensorFlow and BigARTM machine learning libraries to power up their data pipelines as well as the well-established Python development stack to build up the complete product.
The course walks participants through all the steps of building a data-driven product: collecting and storing data, processing and analyzing it, and extracting value. The app built during the class is powered by a rich set of raw data such as unstructured text with metadata, geotags, web logs and photos. The final solution relies on a number of state-of-the-art techniques from the domains of computer vision and natural language processing. Participants will also implement a Telegram bot that acts as a frontend for their application.
The course also describes a real-world case-study that explains a specific collaboration between a startup and an insurance company to introduce a new type of insurance product for the airline industry based solely on data the startup already possessed.
This bootcamp is designed around building an app prototype which implements data-oriented features in the following ways:
- Using computer vision, capture an image, parse it, and recognize and store relevant data. Machines with dedicated GPUs will be provided to each participant.
- Enrich in-app user data by obtaining and parsing additional information from Internet sources.
- Using text-mining algorithms build and train a clustering model to implement a predictive analysis feature.
- Learn the full cycle of obtaining value from in-app data: from data capture to powering specific business insights that directly drive monetization.
- Get valuable case studies from the experts who build real-world data products.
Which Tools Will I Use?
The main programming language for the course is Python 3. The tools below imply Python as the desirable language.
- Computer Vision: TensorFlow (GPU) and PyTorch.
- NLP: BigARTM, NLTK, gensim, beautiful soup, pymorphy.
- Server: Flask, Jinja2, PostgreSQL, Docker.
Pre-requisites: Basic knowledge of Python and familiarity with software prototyping in general. The program is primarily focused on working with data & algorithms and concepting product features rather than making a beautiful app with perfect design and UX.
||Setting the stage: building products and services with data as the core value driver
||Enriching data by parsing additional information from web sources
||Prototyping mobile image capture and parsing using OCR libraries and Swift templates
||Selecting and applying machine learning algorithms to clustering, topic modeling and data classification
||Data preprocessing and feature extraction, preparation of a training dataset
||Prototyping a demo monetization webpage using the output of a trained model
||Supervised prototyping, Q&A
||Closing session: individual/group demos, feedback, Q&A