and Consume Data Assets
Features that are:
Enrich’s feature engineering allows teams to build versioned feature pipelines with high auditability, checkpointing and reproducibility, so data teams can not just deploy more models, faster, but also sleep that much easier as Enrich handles the continuous computation of thousands of features
Data teams iterate that much faster on features and downstream usecases, using a lightweight built-in data catalog as a starting point, through to reusing transforms to build features faster, all the way to reusing features themselves from an internal feature marketplace
Features that are:
See which transforms were stitched together using which sources of data. Figure out pipeline dependencies at a glance.
Down to each Pipeline Run
Trace any run of any pipeline, say from three months ago, when the current production model was trained, down to the git commit of the code that ran, and the parameters chosen for that run. Debugging made 10x faster.
Wikis, confluence, are all incomplete ways to figure out what a dev was thinking when they ran a pipeline, or how they troubleshot an issue. Enrich lets you integrate short videos right alongside the dataset so you have as full a picture as possible. We call this “just-in-place”.
Data folks can find features that others in the organization have already built, and get a sense for how production grade they are by how long they’ve run and the downstream models they’re feeding. Reuse, don’t reinvent.
For data analysts
Easy visualization of datasets using Enrich’s extensible app-like structure. While Enrich does not replace your BI, it’s a convenient way to consume data within a user’s usual workflow.
Statistical distribution of individual features so you have a sense of how the data you care about looks at any point in time
Extensible search - Enrich allows deep search through unified datasets
Enrich In Action
We’re continuously deepening Enrich’s functionality - not just in how it fits into your context (integrations with input data sources, cloud-agnostic implementation, wrapping around multiple compute engines like Pandas and Spark clusters, to outputting data into a host of stores, from S3 to Hive, Redis and Cassandra) to what it means to making it easy to use, from the consumption of features (in our in-built feature marketplace, as well as our search and visualization capabilities) to understanding how these features were built in the first place (metadata, detailed logs, data lieage). Here’s a snippet of how Enrich can work at your organization.
Components and Architecture
USE CASES :
Track utilization of the features along with ownership
SDK and other services to rapidly implement feature engineering modules
Administer versioned, auditable, parameterized pipelines, each generating multiple data sets.
Check provenance of datasets by name or other attributes, and compare runs
Discover datasets via a marketplace for features and along with search interface to build cohorts for analysis
Check drift and access other custom usage monitoring services
Enrich handles the complexity of computation and data semantics by providing a python SDK to develop, document and test the feature engineering modules (transforms, pipelines, scheduling, etc) and controlled execution on the server-side.
The server provides an interface to discover, operate and audit the resulting features or datasets.
Hooks at either end of Enrich allow for understanding (cataloging) input data stores, and surfacing features at any frequency through APIs for downstream consumption, by defining data contracts and integration points.
So for Data Scientists, the Enrich feature store experience simplifies, standardizes, and speeds up the machine learning model deployment pipeline, with confidence in their performance.
How It Works
Enrich - the modular feature store
The Enrich feature store is built modularly, as a collection of apps atop a platform, each of which help data teams nimbly address specific challenges. This eases the implementation of Enrich into any customers' existing data stack, plugging in the missing pieces of the jigsaw, while also allowing these teams to build their own apps to that best meet the their business needs. Out of the box, Enrich provides the following apps
Data Catalog - Catalog of input data with ability crawl data sources
Knowledge base App - Ability to document datasets with audio/video etc. (integrated)
Metadata App - Metadata API service integrated with Enrich service (Beta)
Annotation App - Light weight labeling service to annotate data quickly and securely (Beta)
Profiler App - Automatic profiling and generation of visualization of datasets (integrated)
Audit App - Search interface to surface metadata including lineage
Drift App - Compare how runs have changed over time (Alpha)
Log Processor - Build context from thirdparty logs such as Airflow
Feature Marketplace App - Surface datasets, features and their statistical profiles
Compliance App - Ability to evaluate classified datasets and associate policies (Beta)
Metrics App - Search, visualization and annotation service for metrics datasets
Persona App - Search (free text, form-based), sharing, and downloading of profiles