Articles &

Blog Publications

Growing Data Infrastructure Complexities: Cost Implications and the Way Forward

The nature of data has changed. Data, and its corresponding infrastructure is more complex than ever before, with new tools and techniques emerging and new uses for old ones.


While all new advancements can help meet these challenges, it comes at a cost. In this blog post we'll explore some of the ways in which we see businesses and the data community responding to growing infra complexity.

Read more here

Establishing Organizational Digital Trust:
Why Your Data Products Are Only as Good as Your Data

Despite data’s pervasiveness in decision-making across industries and organizations, our relationship with it remains complex. 60% of enterprise executives don't trust it, preferring to go with their gut instead.


In this blog post, we delve deeper into what it takes for organizations to build trust in their data, as well as some of the frameworks and tools they can use on this journey.

Read more here

Scribble Data at TLMS MLOps World Summit 2022

The MLOps World Summit 2022 in Toronto promised to be the ultimate ML Operations & strategy conference & Expo and it totally delivered. We gathered together with a bunch of experts and industry practitioners to discuss the state of machine learning (ML) in production.


Read our key takeaways from the event.

Read more here

What is the modern data stack, and how do we see it evolving?

The success of a modern business is increasingly tied to how it implements its data infrastructure. The old world data stack comprised data warehouses that worked by centralizing and consolidating large amounts of data, but evolving needs of organizations have given rise to the modern data stack that includes tools leading primary data through a processing pipeline, transforming the raw data into cleaned, well-organized, aggregated data that can be used for reporting, analytics, feature generation, and modeling. Learn more about how Scribble Data plays a role in the modern data stack, and how it enables data teams to quickly go from raw data to fit-for-purpose rich datasets.

Read more here

How to design a Feature Store for Sub ML?

The data science landscape is evolving, with an increasing number of Sub-ML use cases. Our CEO, Venkata Pingali recently spoke with the community about how feature stores need to evolve with the changing organizational requirements. In this blog, we've shared some of the highlights of the discussion.

Read more here

Why Feature Stores Need to be Designed for Sub-ML

Data Science as a discipline has seen the kind of evolution that only few others have. What started as a combination of applied statistics and computer science has enabled powerful data-driven decision making and predictions to solve real world problems. To illustrate this further, think of all the complex Data Warehousing and Business Intelligence solutions companies use to mine data and analyze and visualize results for decision-makers. With time, the size and number of data sets increased, complexities grew, and practitioners shifted to Big Data and Machine Learning. 

Read more here

Welcome to the Age of Sub-ML Use Cases

If you work at a modern data-driven company, chances are you've thought of multiple use cases that you can be enriched with an ML based approach. But a wide variety of reasons forced these use cases to get shelved. But that's no longer the case with a Sub-ML approach which results in accelerated solutions and faster time to value, all in real-time. Find out more about the what, how, and why of Sub-ML and why organizations can no longer choose to ignore it. 

Read more here

Road and Bridge Network
Scaling Entity Matching at The Room with Scribble Enrich and Redis

Entity matching for The Room is a mathematical challenge. At the core of the technology challenge is a mathematically difficult entity-matching problem. Each entity in the system—individuals, organizations, opportunities, and content—must be matched to other entities with high accuracy and relevance, context-sensitivity, and timeliness. Here's how the Enrich platform solved it using Redis.

Read more about it on Redis Labs' own blog.

Hierarchical Features and their Importance in Feature Engineering

Feature engineering is both a central task in machine learning engineering and is also arguably the most complex task. In this post, we will explore how hierarchical features can add value to the data science lifecycle, and how feature hierarchies can improve MLOps productivity.
Read more here.

Dandelion Leaves
Right to Forget: Implementation Overview

Any organization that collects and stores EU resident data is subject to General Data Protection Regulation (GDPR). Examples of such organizations include Google, Facebook, and Amazon. The regulation places the obligation for responsible data handling with such organizations, and gives individuals a number of rights.

Read more here.

The Brave New World of Data Privacy and Accountability  - A CEO's Guide

The compliance landscape involving data is significantly changing in 2020, and it is necessary to understand these changes as soon as possible so you can chart your path, and that of your organization’s, over the next few years. Read more here.

Bookstore for feature store post - 2 - A
Feature Stores - The CEO's Guide

A guide for CEOs to help think through choosing the best feature store to complement the Machine Learning practice within your organisation
Read more here.

Scribble Data raises the first round of funding to scale feature store

We are thrilled to announce that we've just closed our first round of funding to help us scale and deliver our Feature Store product, Enrich, in international markets for enterprise-grade Machine Learning products.  Our investors are data-driven leaders from companies like Google and Amazon, from the US and India.

Read more here.

Image by Leo Wieling
ML Productionization — The CEO's Guide

The COVID world will result in organizations being keener than ever on RoI from their Machine Learning. Productionization of these models has proven to be more difficult than people expected, with a key challenge being robustness. This primer is geared towards helping CEOs think through the productionization of ML models within their organizations.

Read more here.

Should data scientists be excited or worried about the new privacy laws?

The General Data Protection Regulation (GDPR) legislated and passed by the European Union has sent ripples around the world, and depending on who you ask, this could either spell apocalypse, the workings of a nanny state, or a very positive step towards consumer privacy. 

Read more here.

Why your business doesn’t have to wait, to start giving back

“If you’re in the luckiest 1% of humanity, you owe it to the rest of humanity to think about the other 99%.” — Warren Buffett.

 W.B. has given away more than he has left. In fact, he has pledged to give 99% of his wealth. It gives us pause. When talk of CSR and philanthropy are brought up in most offices, there’s a general lull in the air — the process seems like an imposition for most who just consider it another thing to tick off a sundry list. But the act of giving is powerful.  Read more here.

image blog.jpeg
How to get the most out of your organization’s data: The mindset

Every business is a data business And while this aphorism has been around for some time, what does this actually mean to enterprise stakeholders? What should key decision makers be valuing and excited about as they start to invest in analytics tools and ML/AI?

Here’s what we think are the most important aspects to embrace when it comes to data and enterprise.

Read more here.

Reducing Organizational Data Costs

We speak to a number of organizations who are in the process of building and deploying data infrastructure and analytical processes. Organizations face a number of challenges that prevent them from meeting their analytical business objectives. The idea of this note is to share our thoughts on one specific challenge - high cost. Specifically: 1. Cost model - Deconstruction of cost, 2. Drivers - Drivers of each cost dimension, 3.Recommendations - Actions to address each driver

Read more here.

How to turn your startup into a data-informed business

This post is a useful way to think about how to start on a data journey if you're a young startup that's just pushed data to the back burner (say until you had 'enough' traction) or even if you're part of a more mature company that's used to making decisions more on instinct and experience, but now want to complement it by building data capabilities.

Read it here.

The Pitfalls of Data Science (and how you can avoid them)

Depending on who you ask, you’re going to hear data science described as being sexy by some, and decidedly not so by others.

Sexy, I suspect, because in today's geekdom-loving world, we imagine the lab coats have finally turned their laser-like, academic precision to the final economic frontier, data, and the dam holding back all those dollar-laden insights from that data is about to burst.

Read more here.

The Metadata Economy - The Future of Trusted Data Sharing

This post talks about symbiotic businesses buying and selling their data from and to each other, like an ecommerce business with a property developer, unearthing new areas in the city with higher disposable incomes to ramp up their delivery capacity in those areas, in the case of the former, and consider fancier apartment construction for the latter. The businesses participating can be orthogonal in the markets they target, in the industries they’re in, and even their revenue models, but the key to building value is in organizations discovering the right, trusted external data to grow their business. Read it here.

How to Architect for Data Consumption

This is my pet peeve - technical architects are building systems and applications that make data analysis complicated, error prone, and inefficient. We need enablement of data consumption as a first class requirement of any system that is built. I explain here how we could architect differently.

Technical systems architects, including myself until recently, are used to building systems with considerations such as development time, robustness, and evolvability. Any analytics was an after thought. Having spent a few years crunching data at various scales, I see the world differently. I see barriers all around in systems to analytics. Here are a few thoughts on the nature of barriers and how to address them.  Read more here.

What Can We Do With Meta Data?

As the complexity of data and systems that hold data grows, the cost of analysis increases due to time and effort spent in figuring out the feasibility, appropriateness, access, and management of data. We believe that a number of new low-risk and valuable applications can be built through creative application of metadata that can help cope with growing complexity and reduce cost of analysis.

Our pure metadata-based cloud product, Scribble Assist, is the first of many applications that will be built by the larger community. Our experience with customers has convinced us of the value of the approach and that a lot more will come. We discuss how we see the landscape of metadata applications. Read more here.

Data Shifts Power Within Organizations

A major challenge in going more data-driven in organization has less to do with data itself, and more to do with the ability to manage the dynamics that emerge as decision makers look at data as an input to decision process.

I have a particular kind of power shift in mind. I am not referring to the process of democratization of data, which we will discuss in a future article, but rather one due to the emergence of a new powerful organizational entity, The Scorer. The Scorer derives its power from data and its ability to judge both past and future decisions. The Scorer provides alternative decision paths taking away some agency and control from existing decision makers. In this article I explain why and how this happens, and how the process could be managed. Read it here.

Available but Unusable Data - 2 : Semantic Gaps

At Scribble Data we are thinking deeply about why decision makers are not able to get to the data when they need even when relevant data is available in their own databases. The reason this question matters is because we find that decision makers routinely make high risk decisions involving products, marketing, and operations with very limited and ambiguous data, and the absolute cost of an incorrect decision, or the opportunity cost of a delayed decision is significant. In my previous article, I elaborated on why available data is unusable. There are systems and organizational issues, and hard technical problems. The focus of this article is the core technical problem of semantic gaps - the gap in the meanings of the question and the data. Read it here.

Available but Unusable Data: Emerging Organizational Challenge

This post was motivated by a recent article on inability of organizations to apply data. This is something I am deeply thinking about these days. In this and future articles, I wish to explain in simple English what is happening, why, and what to expect over time.

Read it here.