Feature Engineering, Data Pipelines
& Nothing More.
What is Scribble Enrich?
Enrich is a customizable Feature Store, built for data science teams that need efficiency and trust in their datasets.
Enrich ensures that each feature, and so each dataset built using these features, is reproducible, versioned, quality-checked, and searchable.
This means data science teams can deploy models that much faster, and with that much more confidence in the underlying data. This means faster re-training or debugging, and quicker turnaround time for each new version of these models.
Highly customizable, and gives ML engineers an SDK
Streamlines data transforms and manages complexity using versioned pipelines to ease retraining and debugging
Eases collaboration across data teams via its feature marketplace
Provide target data sets feeding APIs for applications
Helps address emerging requirements of explainability, provenance, auditability.
CAPABILITIES & BENEFITS
Stress-tested, reusable modules
Modules with parameter validation, input/ output validation, documentation
Reuse of features, better models
App to discover datasets computed with any statistical attributes.
No dataset without metadata
Build lineage & other applications
Data accesses, and writes, process, quality metrics, state changes
Needs across the lifecycle addressed
Multiple apps including lightweight catalog, lineage search, simple labeller etc.
Link all datasets to code commits
Deploy from Github and online upgrade
Compute over data and metadata such as lineage, drift, expectations etc.
Components & Architecture
Track utilization of the features along with ownership
SDK and other services to rapidly implement feature engineering modules
Administer versioned, auditable, parameterized pipelines, each generating multiple data sets.
Check provenance of datasets by name or other attributes, and compare runs
Discover datasets via a marketplace for features and along with search interface to build cohorts for analysis
Check drift and access other custom usage monitoring services
How it Works
Enrich handles the complexity of computation and data semantics by providing a python SDK to develop, document and test the feature engineering modules (transforms, pipelines, scheduling, etc) and controlled execution on the server-side.
The server provides an interface to discover, operate and audit the resulting features or datasets.
Hooks at either end of Enrich allow for understanding (cataloguing) input data stores, and surfacing features at any frequency through APIs for downstream consumption, by defining data contracts and integration points.
So for Data Scientists, the Enrich feature store experience simplifies, standardizes, and speeds up the model development process, with confidence in their performance.