Data Council Blog

Data Council Blog

Halo Tech - Featured Startup SF '18

In this blog series leading up to our SF18 conference, we invite our featured startups to tell us more about their data engineering challenges. Today, we speak with Halo Tech, an early-stage startup that analyzes complex data to accelerate medical advancements.

 
Q:  What surprised you most as an engineer about the work you did that you'll be telling us about in your talk? 
 

Austen Head: Halo's business logic requires the predictions we make to be highly interpretable. The functionality to explain which features influenced individual predictions was originally designed for customer audit/explanation purposes. But now, we find ourselves using that functionality internally to easily review the impact of models before deploying them.

 

Q: Halo sounds like a cool company! What do you do?

 

Austen Head: Halo accelerates the adoption of medical advancements by improving the sales and marketing effectiveness of biotech and medtech manufacturing companies. We recommend who a client should reach out to, what the client should teach them about, why that prospect would benefit from that instruction, and which specific product from the client addresses that prospect's need.

We are able to provide these insights because we gather and synthesize data to understand the market ground truth of all organizations that might purchase a bio/medtech device (academic researchers, clinical labs, consumer biotech companies, and others).

Halo is a SaaS company of 9 people (4 on data eng/sci, 7 have PhDs). We're led by our CEO, a former Google exec who launched AdSense for content and grew it from $0 to $2B annual revenue.

 

Q: Can you explain your basic product model for data processing?
 
The core internal products for data eng/sci at Halo are:
(1) data ingestion from a wide variety of third party sources
(2) entity resolution among those sources using low-signal keys
(3) text analysis dimension reduction for human interpretability
(4) prediction of actionable revenue opportunities for our clients
 
The talk that I give will be an overview of (4).
 
Q: What do you think a listener will get out of this this talk vs. other talks on distributed data processing and data versioning that they've previously heard?

Austen Head: I'll present an example of a two-layer stacked ensemble model in which the outputs at each layer are interpretable and useful. In order to understand the model, the listener first needs to understand that Halo's clients (enterprises) also have their own customers.

The first layer of models predicts a client's customer response based on features intrinsic to the customer (such as the amount of funding from government grants). The second layer of models predicts how much the client can change the response of customers based on features that are actionable and related to client-customer interactions (such as whether the client invited the customer to a technical webinar on a particular topic).

This framework allows us to recommend that clients reach out to customers where the predicted response is much greater than the observed response in the first model and where the features explain a large amount of the local variability in the second model.

  

New Call-to-action

 

About the Startups Track

The data-oriented Startups Track at DataEngConf features dozen of startups forging ahead with innovative approaches to data and new data technologies. We find the most interesting startups at the intersection of ML, AI, data infrastructure and new applications of data science and highlight them in technical talks from their CTOs and their lead engineers building their platforms. 

Data Engineering, Data Warehouse, Data Strategy

Robert Winslow

Written by Robert Winslow

Robert is a seasoned software consultant with a decade of experience shipping great products. He thrives in early-stage startup environments, and works primarily in Go, Python, and Rust. He has led backend development at companies like RankScience and Spot.com; created a rigorous, open-source time-series benchmarking suite for InfluxData; and rapidly prototyped software in a skunkworks-type product lab. He’s taught graduate statistics at GalvanizeU and mentored at the Stanford d.school. He helps maintain Google’s FlatBuffers project, one of the world’s fastest serialization libraries. A colleague once described him as “the developer equivalent of ‘The Wolf’ from Pulp Fiction."