[Free webinar series] Scribble Conversations: Responsible AI at the Enterprise - Register Now

Resources / Blogs / Scribble Data at the Feature Store Summit 2022

Scribble Data at the Feature Store Summit 2022

Scribble Data at Feature Store Summit 2022

Feature Stores become the key catalyst for production-ready Machine Learning

Over the past 3 years, we’ve heard a lot about Feature Stores. While they might not sound like much, over time, they’ve become table stakes for enterprises building their offerings on ML. 

The rapid adoption of feature stores, where they’re starting to become mainstream instead of being a niche restricted to big-tech, can largely be credited to the efforts of forums like Featurestore.org — an international community of users and developers of Feature Store platforms for machine learning.

One of the initiatives of Featurestore.org is their annual summit, Feature Store Summit (FSS), catering to anyone interested in the discipline of feature engineering, working with feature stores, or just curious about all that’s new in the space. 

With the evolution of MLOps, there’s a lot of emphasis on productionization of data to achieve outcomes. In keeping with the times, this year’s event revolved around the theme of ‘Accelerating Production Machine Learning with Feature Stores’. Scribble Data had the good fortune of sharing the stage with an all-star lineup of speakers at FSS2022 from companies like Uber, Linkedin, Airbnb, Doordash, Disney Streaming, and many more. It was great to hear all about the good, bad and ugly sides of the feature stores and learn from the speakers’ experiences. 


Our takeaways from the event:


Feature stores are no longer niche. EVERYONE knows about them

Big tech building their own feature stores isn’t exactly breaking news, but for companies building in-house feature engineering capabilities, these are the components that are considered important: pipeline orchestration, feature + model engineering, storage layer , model observability, and metrics.

The fact that feature stores are now table stakes was made clear when we observed that of the 18 talks, speakers from 10 of these companies spoke about their proprietary feature stores (it was interesting to note that a majority of these companies are feature store vendors):

  1. Representative companies that have chosen to build their feature store in-house
    1. Uber
    2. Doordash
    3. AirBnb
    4. Linkedin (on Azure)
    5. Disney
    6. Stitchfix

  2. Feature store companies / vendors:
    1. Hopsworks
    2. Featureform
    3. dotData
    4. Scribble Data (Enrich)


There are different flavors of ML that suit different needs of users

FSS also acknowledged the different flavors of Machine Learning. With its widespread adoption,there are different types of personnel responsible for the productionization of data in organizations. This depends on the different needs of their organizations, as well as the volume and complexity of data.

  1. Artisanal ML: This is a creative and exploratory approach to Machine Learning, usually led by citizen scientists. The focus is on building experimental models before they are scaled / industrialized into something that’s more repeatable. In most cases, most of this is done using the scientists’ laptops and Jupyter notebooks.
  2. Analytical ML: This approach to ML is followed in companies which have teams of data scientists, but MLOps isn’t exactly high on the priority list. There is, however, modeling effort involved, but most models are usually “built offline and thrown over the wall” to ML engineers. At Scribble Data, we’ve observed that most organizations are unsure about whether they require a feature store at this stage.
  3. Operational ML: Operational ML uses ML models to autonomously make mission-critical business decisions. These models run “online” in production on a company’s operational data stack. Operational ML depends heavily on MLOps tools, which is why it requires serious infrastructure and ML talent. It’s impossible to get to this stage without a feature store.
  4. Operational ML (in real time): This is an approach that builds on operational ML but with added complexity since real time constraints and guarantees need to be met. And that’s why it becomes important to maintain parity between offline and online features.

While most of these approaches are categorized based on the type of modeling and the effort, at Scribble Data we believe that not all approaches to ML require modeling. To this end, Achint Thomas, Scribble Data’s Data Architect, spoke about Sub-ML (more on this in a bit) – which focuses on the long tail of use cases in an organization and can put data to work, irrespective of the size and complexity.


Machine Learning at Reasonable Scale, Postmodern stack, ML without the MLOps or Sub-ML …

… whatever you may choose to call it, but similar to our observations from the TLMS MLOps World Summit, the long tail use cases in ML are here to stay! While most of the talks were focused on traditional Machine Learning and feature engineering specifically for ML model building and serving, we feel like there’s a largely unaddressed set of users and use cases that can benefit from a feature engineering approach that is more focused on faster outcomes. At Scribble Data, we call this Sub-ML.

One of the companies that’s not focusing solely on data engineers and data scientists that we found particularly interesting was AtScale. They’ve built a semantic layer to make data actionable for business insights, and is more focused on business users. We’re looking forward to seeing how an increased focus on business users can help drive ML adoption in the enterprise.



Based on what we observed at FSS 2022, there are some interesting directions that features stores might be able to take. We’re looking forward to seeing how they’re going to work out in the near future. But one thing’s for certain – build or buy, feature stores are here to stay! We can’t wait for the next edition of FSS, and hope to showcase more updates on Sub-ML and our Enrich feature store.

If you’d like to view a recording of Achint’s talk on fast Sub-ML use case development using feature stores, you can now watch it on demand here (Slides available here). If you’d also like to watch his panel discussion on the challenges of making the feature store disappear and become part of the workflow of data science and data engineering, you can watch it here.

Related Blogs

November 23, 2023

Generative AI in Pension Risk Transfer: Introduction, and Key Use Cases

Warren Buffett famously noted that ‘someone’s sitting in the shade today because someone planted a tree a long time ago.’ Pension risk transfer, or PRT, did not just pop up overnight. It’s got history. Think of it as a response to a big problem: companies promising pensions they later find tough to keep. This dilemma […]

Read More
November 16, 2023

OpenAI’s Custom GPTs: Future Impact and Considerations

The automobile factory was nothing before the assembly line. It was slow. Men built one car at a time. Then the assembly line started, and it was never the same. It went fast. It was a car, then another car, and they came off the end of the line one after the other. This historic […]

Read More
November 9, 2023

Generative AI in Insurance: Use Cases and Future Impact

What if the devastating Hurricane Katrina or Cyclone Nargis had been anticipated with greater precision, its impact mitigated by proactive insurance protocols? How would the landscape of life and health insurance change if underwriters could accurately simulate and understand the long-term health trends of populations? And what if reinsurers could preemptively navigate market collapses or […]

Read More