Blog

Blog posts

November 24, 2022

What is the Metadata Economy?

We live in a hyper-digital world, and due to the nearly  infinite number of data sources that surround us, the volume of data generated collectively by individuals, applications and corporations is larger than ever. With such a monumental amount of data to sift through, two core principles have  become increasingly important: Metadata – Make it […]

Read More
November 10, 2022

Data Science Teams are Doing it Wrong: Putting Technology Ahead of People

Despite $200+ billion spent on ML tools, data science teams still struggle to productionize their data and ML models. We decided to do a deep dive and find out why.  Back in 1991, former US Air Force pilot and noted strategist John Boyd called for U.S. Military reforms after Operation Desert Storm. He noted that […]

Read More
November 3, 2022

MLOps – The CEO’s Guide to Productionization of Data [Part 2]

With data being touted as the oil for digital transformation in the 21st century, organizations are increasingly looking to extract insights from their data by building and deploying their custom-built ML models. In our previous article (MLOps – The CEO’s Guide to Productionization of Data, Part 1), we learned why and how embedding ML models […]

Read More
November 1, 2022

MLOps – The CEO’s Guide to Productionization of Data [Part 1]

MLOps (or Machine Learning Operations) is a core function of Machine Learning engineering, that focuses on streamlining the process of taking ML models to production, and maintaining and monitoring them.  But before we get into more details about MLOps, it’s important to understand what operationalization of machine learning is, why it’s important, and how it […]

Read More
October 25, 2022

Scribble Data at the Feature Store Summit 2022

Over the past 3 years, we’ve heard a lot about Feature Stores. While they might not sound like much, over time, they’ve become table stakes for enterprises building their offerings on ML.  The rapid adoption of feature stores, where they’re starting to become mainstream instead of being a niche restricted to big-tech, can largely be […]

Read More
October 20, 2022

What is Anomaly Detection?

Anomaly detection refers to the process of analysing data sets to detect unusual patterns and outliers that do not conform to expectations.  It takes on even more importance in a world where enterprises depend heavily on an intricate web of distributed systems. With thousands of potentially important data items to monitor every second, it is […]

Read More
October 13, 2022

Feature Stores: The CEO’s Guide

As industries across the globe attempt to adapt to the big data architecture, expensive and ineffective feature engineering practices mean that businesses are very likely to “hit a wall” when it comes to organizing their machine learning operations (MLOps). A lot of time is consumed in data ingestion, and lackluster machine outputs indicate that stakeholders […]

Read More
October 11, 2022

How Postmodern Data Stack helps Fintech companies make faster decisions

The Fintech market is valued at $110.57 billion in 2020 and will reach $698.48 billion by 2030. It is one of the fastest-growing industries with a CAGR of 20.3%. Fintech companies faced a surge in demand as customer practices and banking habits changed during the COVID-19 era. The industry overall saw an increase in user […]

Read More
October 4, 2022

Map Contextual Data as an input to Business Outcomes as an output

Machine learning and data science today are in a unique position where access to capital is often not the biggest barrier to success. Companies globally are continuing to invest into artificial intelligence to the tune of $140 billion, either to develop AI-native products or solutions or as a way to solve business problems and improve […]

Read More
September 29, 2022

The Horizontal and Long Tail Impact of Data

We recently had the good fortune of speaking at ValleyML’s AI Expo 2022 earlier this month. This is an annual event that presents a unique combination of AI Technology, researchers, industry thought leaders and prospective buyers of AI/ML technologies in a single event. The 2022 edition promised even more interesting talks and networking opportunities as it spanned four […]

Read More
September 27, 2022

2023: The Brave New World of Data Privacy and Accountability

The data privacy and compliance landscape continues to significantly change in 2022, and it is necessary to understand these changes as soon as possible so you can chart your path, and that of your organization, over the next few years. EMERGING MEGATRENDS IN THE WORLD OF DATA​ 01. Increased regulatory activity. In the last couple […]

Read More
September 20, 2022

A Primer on Feature Engineering

Feature engineering is the process of selecting, interpreting, and transforming structured or unstructured raw data into attributes (features) that can be used to build effective machine learning models which more accurately represent the problem at hand. In this context, a “feature” refers to any quantifiable unique input that may be used in a predictive model, […]

Read More
September 16, 2022

Is the postmodern data stack the future of faster and more accurate decision-making?

The adoption of artificial intelligence and machine learning has paved the way for drastic changes in data-driven enterprises. To optimize business operations, several companies started embracing what came to be known as the modern data stack. Although this approach benefits big tech companies in making superior business decisions, a majority of companies (which operate at […]

Read More
August 3, 2022

Growing Data Infrastructure Complexities

The world of data, and data infrastructure, has changed dramatically over the past decade. Traditional databases, which were designed to store information in a structured format, have evolved into massive warehouses of unstructured data that sit on multiple servers across different locations. Not too long ago, we were used to seeing monolithic systems dominated by […]

Read More
July 7, 2022

Trust in Data: The Rise of Adversarial Machine Learning

Increased dependence on data and Machine Learning, and a lack of understanding of complex ML models are giving rise to a new category of cyber attacks called Adversarial Machine Learning attacks.  Machine learning impacts our everyday lives – it determines what we see on eCommerce websites, social media platforms, and search engines. Since machine learning […]

Read More
June 22, 2022

Establishing Organizational Digital Trust

With big data powering the optimum business decision-making in this century, organizations need to generate trust in their data sources which otherwise proves to be a source of risk. Data is now ubiquitous — according to Statista, the aggregate data volume generated was 64.2 zettabytes in 2020, and it is only predicted to shoot upwards […]

Read More
June 15, 2022

Scribble Data at TLMS MLOps World Summit 2022

It’s an exciting time for the MLOps ecosystem, and there’s no better place to be than in Toronto! The MLOps World Summit 2022 happened last week in Toronto and truly lived up to its promise of being the ultimate ML Operations & strategy conference & Expo. It saw a number of MLOps companies and practitioners, including our […]

Read More
June 1, 2022

What is the modern data stack?

The success of a modern business, ranging from small and medium-sized enterprises to Fortune 500 conglomerates, is now increasingly tied to how firms implement their data infrastructure. We’ve all heard the trope – “data is the new oil” of the digital economy (source). One thing is clear: information is power, and data analytics can be utilized […]

Read More
April 21, 2022

How to design a Feature Store for Sub ML?

Let’s assume you want to leverage data to improve one of your processes, such as partner benchmarking. Even though it’s one of your top priorities for the year, you have limited resources to spend on partner data collection, segregation, and overall data preparation to do any sort of analysis. And even if you find a […]

Read More
April 5, 2022

Why Feature Stores Need to be Designed for Sub-ML

Data Science as a discipline has seen the kind of evolution that only few others have. What started as a combination of applied statistics and computer science has enabled powerful data-driven decision making and predictions to solve real world problems. To illustrate this further, think of all the complex Data Warehousing and Business Intelligence solutions […]

Read More
March 15, 2022

Scribble Data Raises $2.2 M to Scale Their Modularized, Cloud-Native Feature Store

TORONTO, March 15, 2022: ​ Scribble Data, an ML feature engineering startup today announced that it has raised $2.2 million in seed funding led by Blume Ventures. The round also saw participation from Log X Ventures and Sprout Venture Partners, in addition to participation from Vivek N. Gour (former CFO, Genpact) and Ganesh Rao (Partner, […]

Read More
January 11, 2022

Welcome to the age of Sub-ML use cases

Let’s say you work at a modern data-driven company and you want to find a way to enhance one of your processes, like partner management. It makes sense considering you have limited resources to invest in partner development, but it ranks high on your growth goals for the year. The first step would be to […]

Read More
June 29, 2021

Scaling Entity Matching at The Room with Scribble Enrich and Redis

The Room’s mission is to connect top talent from around the world to meaningful opportunities. Envisioned as a technology-driven, community-centric platform to help organizations quickly find high-quality, vetted talent at scale, The Room will host tens of millions of members in its system and have a worldwide presence. At the core of the technology challenge […]

Read More
April 7, 2021

Hierarchical Features and their Importance in Feature Engineering

Feature engineering is both a central task in machine learning engineering and is also arguably the most complex task. Data scientists who build models that need to be deployed at large scales, across functional, technical, geographic, demographic and other categories have to reason about how they choose the features for the models. Despite the divergent […]

Read More
October 28, 2020

Right to Forget

General Data Protection Regulation (GDPR) ​ Any organization that collects and stores EU resident data is subject to General Data Protection Regulation (GDPR). Examples of such organizations include Google, Facebook, and Amazon. The regulation places the obligation for responsible data handling with such organizations, and gives individuals a number of rights. All major geographies now have GDPR-like […]

Read More
May 14, 2020

Scribble Data raises funding to scale feature store

We are thrilled to announce that we’ve just closed our first round of funding to help us scale and deliver our Feature Store product, Enrich, in international markets for enterprise-grade Machine Learning products.  Our investors are data-driven leaders from companies like Google and Amazon, from the US and India.  ​ Scribble Enrich, our feature store […]

Read More
October 1, 2018

Should Data Scientists Be Excited Or Worried About The New Privacy Laws?

The General Data Protection Regulation (GDPR) legislated and passed by the European Union has sent ripples around the world, and depending on who you ask, this could either spell apocalypse, the workings of a nanny state, or a very positive step towards consumer privacy. The direct objective of such a ruling is to give control […]

Read More
September 5, 2018

Why your business doesn’t have to wait, to start giving back

“If you’re in the luckiest 1% of humanity, you owe it to the rest of humanity to think about the other 99%.” — Warren Buffett W.B. has given away more than he has left. In fact, he has pledged to give 99% of his wealth. It gives us pause. When talk of CSR and philanthropy are […]

Read More
September 3, 2018

How to get the most out of your organization’s data: The mindset

Every business is a data business And while this aphorism has been around for some time, what does this actually mean to enterprise stakeholders? What should key decision makers be valuing and excited about as they start to invest in analytics tools and ML/AI? Here’s what we think are the most important aspects to embrace […]

Read More
June 11, 2018

Reducing Organizational Data Costs

We speak to a number of organizations who are in the process of building and deploying data infrastructure and analytical processes. Organizations face a number of challenges that prevent them from meeting their analytical business objectives. The idea of this note is to share our thoughts on one specific challenge – high cost. Specifically: Cost […]

Read More
July 17, 2017

The Pitfalls of Data Science (and how you can avoid them)

[Update]: This article is getting a good bit of engagement. If it resonates with you, I’d love it if you could answer a short 2 minute survey on your data journey here. I will add the same survey link at the end of this post as well. Depending on who you ask, you’re going to hear data […]

Read More
July 12, 2017

How to Architect for Data Consumption

This is my pet peeve – technical architects are building systems and applications that make data analysis complicated, error prone, and inefficient. We need enablement of data consumption as a first class requirement of any system that is built. I explain here how we could architect differently. Technical systems architects, including myself until recently, are […]

Read More
June 6, 2017

What Can We Do With Metadata?

As the complexity of data and systems that hold data grows, the cost of analysis increases due to time and effort spent in figuring out the feasibility, appropriateness, access, and management of data. We believe that a number of new low-risk and valuable applications can be built through creative application of metadata that can help […]

Read More
February 20, 2017

Data Shifts Power Within Organizations

A major challenge in going more data-driven in organization has less to do with data itself, and more to do with the ability to manage the dynamics that emerge as decision makers look at data as an input to decision process. I have a particular kind of power shift in mind. I am not referring […]

Read More
September 15, 2016

Available But Unusable Data – Part II – Semantic Gaps

At Scribble Data we are thinking deeply about why decision makers are not able to get to the data when they need even when relevant data is available in their own databases. The reason this question matters is because we find that decision makers routinely make high risk decisions involving products, marketing, and operations with […]

Read More
September 15, 2016

Available But Unusable Data – Part II – Semantic Gaps

At Scribble Data we are thinking deeply about why decision makers are not able to get to the data when they need even when relevant data is available in their own databases. The reason this question matters is because we find that decision makers routinely make high risk decisions involving products, marketing, and operations with […]

Read More