Data Council Blog

Data Council Blog

Open Source Highlight: Apache Superset

Apache Superset is a very popular open-source project that provides users with an exploration and visualization platform for their (big or not-so-big) data. For instance, it can be used to create line charts, but also advanced geospatial charts and dashboards that support queries via SQL Lab.

Open Source Highlight: Cube.js

Cube.js is an open source analytics framework meant to answer the "lack of tools for software engineers who are building production, customer-facing applications and need to embed analytics features into these applications," its co-founder and CEO Artyom Keydunov explained in a blog post

Open Source Highlight: Streamlit

Streamlit officially launched out of beta on October 1st, 2019 with the promise to "turn Python scripts into beautiful ML tools." On the same day, Google's AI-focused venture fund Gradient Ventures announced its investment into the startup, which has since then attracted a considerable amount of attention despite its young age.

PyTorch Lightning, ksqlDB and More: Top 10 Links from Across the Web

Here are 10 recent relevant links for data professionals, from blog posts and tutorials to podcast episodes:

1. PyTorch Lightning: a gentle introduction

Former Data Council speaker Will Falcon published an interesting post on PyTorch Lightning, the lightweight PyTorch wrapper born out of his Ph.D. AI research at NYU CILVR and Facebook AI Research (FAIR). Framed as "a gentle introduction", it includes a side-by-side comparison of building a simple MNIST classifier PyTorch and PyTorch Lightning, in order to illustrate how to refactor one into the other. This is highly recommended reading if you are working on AI/ML research, be it as a professional researcher, student or in production.

How Histograms Can Help Improve Your Ops Monitoring

 

 

Life comes at you fast. Data even more so ...

When the engineering team at Circonus began to feel the pain of systems at scale, there were some common observability tools that provided them with a firehose of operational time series telemetry. However, managing all that data, yet alone making sense of it, was extremely difficult. And the existing tools they tried for managing time series metrics either didn't give mathematical insight, or fell over at modest workloads. They needed a better solution. So they decided to look into other statistical tooling options that had proven themselves for decades in other industries.

How Big Data Can Help Improve the Meteorological Risk Models That Are Out of Date

According to a recent article published in The New York Times, water damage from hurricane Harvey extended far beyond flood zones. Now that the rescue efforts are underway, it’s clear that much of the damage occurred outside of the typical boundaries drawn on official FEMA flood maps.