Enrich
 

Build, Understand
and Consume Data Assets

product@300x-8.png
screenshots-07.png

Build
Features that are:

Robust​ 

​Enrich’s feature engineering allows teams to build versioned feature pipelines with high auditability, checkpointing and reproducibility, so data teams can not just deploy more models, faster, but also sleep that much easier as Enrich handles the continuous computation of thousands of features 

 

Nimble 

Data teams iterate that much faster on features and downstream usecases, using a lightweight built-in data catalog as a starting point, through to reusing transforms to build features faster, all the way to reusing features themselves from an internal feature marketplace

Understand
Features that are:

Lineage

See which transforms were stitched together using which sources of data. Figure out pipeline dependencies at a glance.

Down to each Pipeline Run

Trace any run of any pipeline, say from three months ago, when the current production model was trained, down to the git commit of the code that ran, and the parameters chosen for that run. Debugging made 10x faster.

Knowledge Base

Wikis, confluence, are all incomplete ways to figure out what a dev was thinking when they ran a pipeline, or how they troubleshot an issue. Enrich lets you integrate short videos right alongside the dataset so you have as full a picture as possible. We call this “just-in-place”.

screenshots-09.png
screenshots-11.png

Consume
Features by:

Feature marketplace

Data folks can find features that others in the organization have already built, and get a sense for how production grade they are by how long they’ve run and the downstream models they’re feeding. Reuse, don’t reinvent.

 

For data analysts

Easy visualization of datasets using Enrich’s extensible app-like structure. While Enrich does not replace your BI, it’s a convenient way to consume data within a user’s usual workflow.

At-a-glance

Statistical distribution of individual features so you have a sense of how the data you care about looks at any point in time

Extensible search - Enrich allows deep search through unified datasets 

Enrich In Action

We’re continuously deepening Enrich’s functionality - not just in how it fits into your context (integrations with input data sources, cloud-agnostic implementation, wrapping around multiple compute engines like Pandas and Spark clusters, to outputting data into a host of stores, from S3 to Hive, Redis and Cassandra) to what it means to making it easy to use, from the consumption of features (in our in-built feature marketplace, as well as our search and visualization capabilities) to understanding how these features were built in the first place (metadata, detailed logs, data lineage). Here’s a snippet of how Enrich can work at your organization.

Components and Architecture

scribble product diagram_architecture.png

USE CASES :

Track utilization of the features along with ownership

IMPLEMENT :

SDK and other services to rapidly implement feature engineering modules

OPERATE :

Administer versioned, auditable, parameterized pipelines, each generating multiple data sets.

AUDIT :

Check provenance of datasets by name or other attributes, and compare runs

ACCESS :

Discover datasets via a marketplace for features and along with search interface to build cohorts for analysis

MONITOR :

Check drift and access other custom usage monitoring services

Enrich handles the complexity of computation and data semantics by providing a python SDK to develop, document and test the feature engineering modules (transforms, pipelines, scheduling, etc) and controlled execution on the server-side. 

 

The server provides an interface to discover, operate and audit the resulting features or datasets.

Hooks at either end of Enrich allow for understanding (cataloging) input data stores, and surfacing features at any frequency through APIs for downstream consumption, by defining data contracts and integration points. 

 

So for Data Scientists, the Enrich feature store experience simplifies, standardizes, and speeds up the machine learning model deployment pipeline, with confidence in their performance.

How It Works

Untitled-2-01_edited.jpg
scribble product diagram updated_linear copy.png

Enrich - the modular feature store

The Enrich feature store is built modularly, as a collection of apps atop a platform, each of which help data teams nimbly address specific challenges. This eases the implementation of Enrich into any customers' existing data stack, plugging in the missing pieces of the jigsaw, while also allowing these teams to build their own apps to that best meet the their business needs. Out of the box, Enrich provides the following apps

Build
  • Data Catalog - Catalog of input data with ability crawl data sources

  • Knowledge base App - Ability to document datasets with audio/video etc. (integrated)

  • Metadata App - Metadata API service integrated with Enrich service (Beta)

  • Annotation App - Light weight labeling service to annotate data quickly and securely (Beta)

Asset 5_2x.png
Build.png
Understand
  • Profiler App - Automatic profiling and generation of visualization of datasets (integrated)

  • Audit App - Search interface to surface metadata including lineage

  • Drift App - Compare how runs have changed over time (Alpha)

  • Log Processor - Build context from thirdparty logs such as Airflow

understand.png
Asset 4_2x.png
Consume
  • Feature Marketplace App - Surface datasets, features and their statistical profiles

  • Compliance App - Ability to evaluate classified datasets and associate policies  (Beta)

  • Metrics App - Search, visualization and annotation service for metrics datasets

  • Persona App - Search (free text, form-based), sharing, and downloading of profiles

consume.png
Asset 1_2x.png
 
Scribblescribble1