Blog
All you need to know about all that’s latest and greatest at Scribble Data's labs. Read on to
learn about how we’re reducing friction in the consumption of data.
Blog Posts
The Future of Data Product Development: Exploring Key Trends
The year is 2023, and Sarah, a data analyst at a leading tech firm, no longer spends hours writing complex SQL queries or sifting through vast datasets. Instead, she simply asks her data product, powered by a Large Language Model (LLM), “What were the sales trends last quarter?” and receives a comprehensive, human-like response. This […]
Read MoreMastering Generative AI: A comprehensive guide
The year was 2018. Art enthusiasts, collectors, and critics from around the world gathered at Christie’s, one of the most prestigious auction houses. The spotlight was on a unique portrait titled “Edmond de Belamy.” At first glance, it bore the hallmarks of classical artistry: a mysterious figure, blurred features reminiscent of an old master’s touch, […]
Read MoreNavigating the Data Landscape: A Deep Dive into Warehouses, Lakes, Meshes, and Fabrics
It’s your first day at “TechTonic Innovations,” a (fictional) startup that’s been making waves in the tech industry. As you enter their modern office, you’re greeted with smiles, handshakes, and the subtle hum of servers in the background. You’ve been brought in as the new Data Strategist, and you’re eager to dive into the heart […]
Read MoreStay updated on the latest and greatest at Scribble Data
Sign up to our newsletter and get exclusive access to our launches and updates
From Data to Decisions: How Generative AI is Transforming Enterprise Analytics
It’s the 24th century aboard the Starship Enterprise. Captain Jean-Luc Picard, in need of a break from the rigors of interstellar diplomacy, steps into the Holodeck. This isn’t just any room; it’s a technological marvel, a space where any scenario can be simulated, any world, any reality can come to life. Picard chooses a 1940s […]
Read MoreDeploying Responsible AI: Big Picture Questions and Strategies
At Scribble Data, our goal is to help organizations make better decisions with data. Over the last year, rapid advancements in Generative AI (GenAI), large language models (LLMs) and natural language processing (NLP) have been a shot in the arm for us. These innovations inspired us to launch Hasper, our machine learning and LLM-based data […]
Read MoreData Fabric: Unraveling the Future of Integrated Data Management
Scene 1: Picture waking up to the soft strumming of the acoustic guitar on Bon Iver’s “Holocene”, a song recommendation from Spotify based on your recent obsession with indie folk. Scene 2: As you sip your morning coffee, you scroll through your Amazon app, noticing a recommendation for a book on “Modern Folklore and Music.” […]
Read MoreFrom Raw Data to Revolutionary Insights: A Deep Dive into Data Product Architecture
The Oakland Coliseum was abuzz, the air thick with anticipation. In the dimly lit back office, Billy Beane, the General Manager of the Oakland Athletics, sat hunched over a cluttered desk. Papers were strewn everywhere, but Billy’s focus was on a single sheet filled with numbers, statistics, and player names. The Athletics, with one of […]
Read MoreOverfitting and Underfitting in ML: Introduction, Techniques, and Future
In 2016, the tech world was all ears and eyes. Microsoft was gearing up to introduce Tay, an AI chatbot designed to chit-chat and learn from users on Twitter. The hype was real: this was supposed to be a glimpse into the future where AI and humans would be best buddies. But, in a plot […]
Read MoreZero Shot Learning: A complete guide
In the realm of the big screen, there’s a man who needs no introduction. A man of resourcefulness, a man of ingenuity, a man who could turn a paperclip into a key to conquer the most impossible of missions. His name? Ethan Hunt. He is the embodiment of the idea that necessity is the mother […]
Read MoreSynthetic Data in Machine Learning: Introduction, Applications, and Future
Picture this: You’re in the world of “Inception,” Christopher Nolan’s cinematic masterpiece. Dream architects are crafting intricate labyrinths within dreams, creating realities so convincing that the dreamer can’t tell they’re asleep. They are bending the fabric of the dream, shaping it to their will, whether it’s a heart-pounding chase through a bustling market or a […]
Read MoreMastering Inference in AI: Introduction, Use Cases, and Future Trends
Imagine Sherlock Holmes, the iconic detective, in the midst of a confounding crime scene. He’s encircled by a constellation of clues—a peculiarly bent poker pipe, a singular set of footprints, and a unique brand of cigarette ash. Each piece of evidence is a fragment of a larger narrative, and it is Holmes’s task to weave […]
Read MoreTransfer learning in AI: A complete guide
Picture yourself as a culinary maestro. You have dedicated countless hours in the kitchen, mastering the nuances of French cuisine, perfecting the art of sourdough, and orchestrating symphonies of flavor in a well-risen chocolate soufflé. Each culinary expedition has bestowed upon you a wealth of knowledge—harmonizing tastes, kneading the dough with finesse, and deftly tempering […]
Read MoreMultimodal Learning In AI: Introduction, Current Trends, and Future
As a conductor stands poised on the podium, baton aloft, they survey the orchestra before them. Each musician holds a different instrument, a unique voice in the grand symphony they are about to perform. Violins, their strings humming with anticipation, are primed to sing the melody. Cellos stand ready to resonate with harmony, the percussion […]
Read MoreIntroducing Hasper: LLM-powered Engine For Advanced Analytics
Over the last year, we have evolved from an MLops platform company that gave enterprises the ability to build and deploy machine learning for analytics teams, to an applied AI data products platform. Throughout this journey, our mission has remained consistent: to help organizations make better decisions using data. We’ve reached a pivotal moment in […]
Read MoreDriving Innovation through ML: Scribble Data’s learnings from Toronto Machine Learning Summit 2023
The recently concluded Toronto Machine Learning Summit 2023 (TMLS 2023) brought together researchers, academics, and practitioners in the machine learning (ML) space. With an agenda including talks, roundtable discussions, and poster presentations, there was much to soak in on the latest trends and advancements in ML and MLOps. Scribble Data was a sponsor of the […]
Read MoreWord Vectorization 101: The Journey from Text to Numbers
Navigating through the labyrinthine streets of ancient Rome without a map or GPS, you would quickly realize how every landmark, road, and destination forms part of a larger, intricate whole. A wrong turn at the Pantheon could lead you away from the Colosseum,or a shortcut through Piazza Navona could help you stumble upon the grandeur […]
Read MoreLLMs for data classification: How Scribble built SADL for achieving breakthrough accuracy
Modern-day organizations are generating vast amounts of data that hold immense potential for making informed decisions. However, with the ever-growing volume of data, the greater challenge lies in how these organizations can generate actionable insights. Data classification plays a vital role in addressing this challenge. Until now, organizations have relied on traditional methods for data […]
Read MoreFine-tuning Large Language Models: Complete Optimization Guide
Let’s say you buy a high-performance sports car, fresh off the production line. It’s capable, versatile, and ready to take on most driving conditions with ease. But what if you have a specific goal in mind – let’s say, winning a championship in off-road rally racing? The sports car, for all its inherent capabilities, would […]
Read MoreUnderstanding Prompt Engineering: Introduction, Techniques and Future Perspective
Prompt engineering is a fascinating new frontier in the world of AI that is rapidly gaining momentum as the world at large awakens to the potential of LLMs. Research in the field of prompt engineering has exponentially ramped up in the last couple of years since consumer applications such as ChatGPT have taken the Internet […]
Read MoreLarge Language Models 101: History, Evolution and Future
Imagine walking into the Library of Alexandria, one of the largest and most important libraries of the ancient world, filled with countless scrolls and books representing the accumulated knowledge of the entire human race. It’s like being transported into a world of endless learning, where you could spend entire lifetimes poring over the insights of […]
Read MoreFoundation Models: A step-by-step guide for beginners
The emergence of foundation models represents a seismic shift in the world of artificial intelligence. Foundation models are like digital polymaths, capable of mastering everything from language to vision to creativity. Have you ever wanted to know what a refined, gentlemanly Shiba Inu might look like on a European vacation? Of course you have. Well, […]
Read MoreManaging The Organizational Impact of Bad Data
Big data is an indispensable part of our modern existence, powering several real-world applications such as personalized marketing, healthcare diagnostics, fraud prevention and many more that have transformed the way we live, work, and communicate with each other. However, since big data has become such a critical component of organizational decision-making, it is imperative to […]
Read MoreHow Data Products Can Help Overcome Data Consumption Challenges
Data has grown in importance as a commercial asset, with many companies investing considerably in data collection and transformation. Nevertheless, data collection is not the biggest challenge; what businesses do with it is. In the age of big data, another crucial difficulty is guaranteeing quality. Moreover, firms frequently face data management difficulties such as inefficient […]
Read MoreData Product Lifecycle: Evolution and Best Practices
Data products have exploded in popularity over the last few years. As an industry, we are where the automobile industry was around the turn of the 20th century. We are slowly transitioning from building hand-crafted, exclusive products for Big Tech customers to widespread commoditization. Soon, efficiency, maintenance, standards, and assembly lines are going to be […]
Read More4 Advanced Analytics Techniques to Improve Decision-Making
In today’s data-driven business landscape, organizations are constantly pressured to make faster, more informed decisions that drive better outcomes. According to Forbes, 53% of companies use big data analytics to take inform business decisions. An HBR study points out that companies that use data-driven decision-making are 6% more profitable than those that don’t. However, with […]
Read MoreWhat are Data Products?
“There were 5 exabytes of information created between the dawn of civilization through 2003, but that much information is now created every two days.” – Eric Schmidt, Google Human beings are now, on a semi-daily basis, generating and collecting data that equals the volume of the total collective knowledge of our species till around the […]
Read More5 Advanced Analytics Benefits For Your Organization
Advanced data analytics is a powerful tool for businesses that want to gain insights from their data. Advanced data analytics can provide unprecedented visibility into customer trends and preferences through sophisticated algorithms and technologies. Organizations can use these insights to identify new opportunities or better understand customer behavior. According to a McKinsey study, organizations that […]
Read MoreHarnessing the Power of Big Data and Advanced Analytics
International Data Corporation (IDC) predicts that by 2025, the amount of data generated worldwide will reach 163 zettabytes, growing at a CAGR of 44%. Not just that, Gartner predicts that by 2025, AI-driven automation will reduce data preparation time by 95%, enabling organizations to analyze vast amounts of data in real-time. Walmart, the world’s largest […]
Read MoreThe Path To Ubiquitous Machine Learning
Imagine a world where a confluence of intelligent systems anticipate and cater to every want and need, seamlessly enhancing your day-to-day existence. A world where machine learning trickles into every cog that makes our world work, making it as essential and widespread as electricity. There is a lot of optimism about machine learning (ML) in […]
Read MoreUnderstanding the Advanced Data Analytics Lifecycle
Businesses around the world generate massive quantities of data daily in the form of server logs, web analytics, transactional information, and customer data. To effectively process this much information and derive actual value from it, businesses need to consider advanced analytics techniques for decision-making. We already discussed its applications across industries in our previous article. […]
Read MoreAdvanced Analytics: Techniques, Examples, and Benefits
Data is the most important asset for any modern organization, backing most business-critical decisions today. However, fully capturing the potential of the company’s data sources, so that they start yielding impactful business insights, is not a straightforward task and the traditional BI and analytics stack is just not at the level to handle the complex […]
Read More2023: A Critical Year for ML’s Rapid Growth
As 2022 draws to a close, it is time to reflect on the year gone by and welcome 2023! I’d like to take this opportunity to talk about some of the highs, the lows, the opportunities and learnings in 2022, how we’ve seen the market evolving, how it’s impacted some of the choices we’ve made […]
Read MoreSecurity in ML Systems using Feature Stores
With the transformational early successes in value creation, AI/ML is set to become ubiquitous. By 2030, AI could potentially contribute up to $15.7tr to the global economy. As more and more organizations are depending on data and Machine Learning (ML) models for their crucial decision-making, the security of data and these ML systems is business […]
Read MoreOur Learnings in Getting SOC 2 Type II Certified as a Startup
The SOC 2 certification process is considered to be painstaking, but it doesn’t need to be. We share our experience in this one-stop guide for other startups that are considering becoming SOC 2 Type II certified. Every day, Analytics and Data Science teams across the globe trust Scribble Data to solve persistent business problems with […]
Read MoreScribble Data Earns SOC 2 Type II Compliance Certification
The SOC 2 certification validates the makers of Enrich full-stack feature engineering platform as a reliable data partner that ensures the safety and privacy of customer data. TORONTO, DECEMBER 5, 2022 Scribble Data, maker of Enrich, a full-stack feature engineering platform for analytics, has successfully achieved SOC 2 Type II certification after completing a third-party […]
Read MoreWhat is the Metadata Economy?
We live in a hyper-digital world, and due to the nearly infinite number of data sources that surround us, the volume of data generated collectively by individuals, applications and corporations is larger than ever. With such a monumental amount of data to sift through, two core principles have become increasingly important: Metadata – Make it […]
Read MoreData Science Teams are Doing it Wrong: Putting Technology Ahead of People
Despite $200+ billion spent on ML tools, data science teams still struggle to productionize their data and ML models. We decided to do a deep dive and find out why. Back in 1991, former US Air Force pilot and noted strategist John Boyd called for U.S. Military reforms after Operation Desert Storm. He noted that […]
Read MoreMLOps – The CEO’s Guide to Productionization of Data [Part 2]
With data being touted as the oil for digital transformation in the 21st century, organizations are increasingly looking to extract insights from their data by building and deploying their custom-built ML models. In our previous article (MLOps – The CEO’s Guide to Productionization of Data, Part 1), we learned why and how embedding ML models […]
Read MoreMLOps – The CEO’s Guide to Productionization of Data [Part 1]
MLOps (or Machine Learning Operations) is a core function of Machine Learning engineering, that focuses on streamlining the process of taking ML models to production, and maintaining and monitoring them. But before we get into more details about MLOps, it’s important to understand what operationalization of machine learning is, why it’s important, and how it […]
Read MoreScribble Data at the Feature Store Summit 2022
Over the past 3 years, we’ve heard a lot about Feature Stores. While they might not sound like much, over time, they’ve become table stakes for enterprises building their offerings on ML. The rapid adoption of feature stores, where they’re starting to become mainstream instead of being a niche restricted to big-tech, can largely be […]
Read MoreWhat Is Anomaly Detection? Importance, Methods, Challenges and, Use Cases
Anomaly detection refers to the process of analysing data sets to detect unusual patterns and outliers that do not conform to expectations. It takes on even more importance in a world where enterprises depend heavily on an intricate web of distributed systems. With thousands of potentially important data items to monitor every second, it is […]
Read MoreFeature Stores: The CEO’s Guide
As industries across the globe attempt to adapt to the big data architecture, expensive and ineffective feature engineering practices mean that businesses are very likely to “hit a wall” when it comes to organizing their machine learning operations (MLOps). A lot of time is consumed in data ingestion, and lackluster machine outputs indicate that stakeholders […]
Read MoreHow Postmodern Data Stack helps Fintech companies make faster decisions
The Fintech market is valued at $110.57 billion in 2020 and will reach $698.48 billion by 2030. It is one of the fastest-growing industries with a CAGR of 20.3%. Fintech companies faced a surge in demand as customer practices and banking habits changed during the COVID-19 era. The industry overall saw an increase in user […]
Read MoreMap Business Context as an input to Build and Outcome Focused Data Strategy
Machine learning and data science today are in a unique position where access to capital is often not the biggest barrier to success. Companies globally are continuing to invest into artificial intelligence to the tune of $140 billion, either to develop AI-native products or solutions or as a way to solve business problems and improve […]
Read MoreThe Horizontal and Long Tail Impact of Data
We recently had the good fortune of speaking at ValleyML’s AI Expo 2022 earlier this month. This is an annual event that presents a unique combination of AI Technology, researchers, industry thought leaders and prospective buyers of AI/ML technologies in a single event. The 2022 edition promised even more interesting talks and networking opportunities as it spanned four […]
Read More2023: The Brave New World of Data Privacy and Accountability
The data privacy and compliance landscape continues to significantly change in 2022, and it is necessary to understand these changes as soon as possible so you can chart your path, and that of your organization, over the next few years. EMERGING MEGATRENDS IN THE WORLD OF DATA 01. Increased regulatory activity. In the last couple […]
Read MoreA Primer on Feature Engineering
Feature engineering is the process of selecting, interpreting, and transforming structured or unstructured raw data into attributes (features) that can be used to build effective machine learning models which more accurately represent the problem at hand. In this context, a “feature” refers to any quantifiable unique input that may be used in a predictive model, […]
Read MoreWhat is the postmodern data stack?
The adoption of artificial intelligence and machine learning has paved the way for drastic changes in data-driven enterprises. To optimize business operations, several companies started embracing what came to be known as the modern data stack. Although this approach benefits big tech companies in making superior business decisions, a majority of companies (which operate at […]
Read MoreGrowing Data Infrastructure Complexities
The world of data, and data infrastructure, has changed dramatically over the past decade. Traditional databases, which were designed to store information in a structured format, have evolved into massive warehouses of unstructured data that sit on multiple servers across different locations. Not too long ago, we were used to seeing monolithic systems dominated by […]
Read MoreTrust in Data: The Rise of Adversarial Machine Learning
Increased dependence on data and Machine Learning, and a lack of understanding of complex ML models are giving rise to a new category of cyber attacks called Adversarial Machine Learning attacks. Machine learning impacts our everyday lives – it determines what we see on eCommerce websites, social media platforms, and search engines. Since machine learning […]
Read MoreEstablishing Organizational Digital Trust in Data
With big data powering the optimum business decision-making in this century, organizations need to generate trust in their data sources which otherwise proves to be a source of risk. Data is now ubiquitous — according to Statista, the aggregate data volume generated was 64.2 zettabytes in 2020, and it is only predicted to shoot upwards […]
Read MoreScribble Data at TMLS MLOps World Summit 2022
It’s an exciting time for the MLOps ecosystem, and there’s no better place to be than in Toronto! The MLOps World Summit 2022 happened last week in Toronto and truly lived up to its promise of being the ultimate ML Operations & strategy conference & Expo. It saw a number of MLOps companies and practitioners, including our […]
Read MoreWhat is the modern data stack?
The success of a modern business, ranging from small and medium-sized enterprises to Fortune 500 conglomerates, is now increasingly tied to how firms implement their data infrastructure. We’ve all heard the trope – “data is the new oil” of the digital economy (source). One thing is clear: information is power, and data analytics can be utilized […]
Read MoreHow to design a Feature Store for Sub-ML?
Let’s assume you want to leverage data to improve one of your processes, such as partner benchmarking. Even though it’s one of your top priorities for the year, you have limited resources to spend on partner data collection, segregation, and overall data preparation to do any sort of analysis. And even if you find a […]
Read MoreWhy Feature Stores Need to be Designed for Sub-ML Use Cases
In our last article, we introduced Sub-ML use cases and discussed how their number is growing. In this article, we’ll try and understand how purpose built feature stores for solving Sub-ML use cases can help drive more value with data. Data Science as a discipline has seen the kind of evolution that only few others […]
Read MoreScribble Data Raises $2.2 M to Scale Their Modularized, Cloud-Native Feature Store
TORONTO, March 15, 2022: Scribble Data, an ML feature engineering startup today announced that it has raised $2.2 million in seed funding led by Blume Ventures. The round also saw participation from Log X Ventures and Sprout Venture Partners, in addition to participation from Vivek N. Gour (former CFO, Genpact) and Ganesh Rao (Partner, Trilegal). […]
Read MoreWelcome to the age of Sub-ML use cases
Let’s say you work at a modern data-driven company and you want to find a way to enhance one of your processes, like partner management. It makes sense considering you have limited resources to invest in partner development, but it ranks high on your growth goals for the year. The first step would be to […]
Read MoreScaling Entity Matching at The Room with Scribble Enrich and Redis
The Room’s mission is to connect top talent from around the world to meaningful opportunities. Envisioned as a technology-driven, community-centric platform to help organizations quickly find high-quality, vetted talent at scale, The Room will host tens of millions of members in its system and have a worldwide presence. At the core of the technology challenge […]
Read MoreHierarchical Features and their Importance in Feature Engineering
Feature engineering is both a central task in machine learning engineering and is also arguably the most complex task. Data scientists who build models that need to be deployed at large scales, across functional, technical, geographic, demographic and other categories have to reason about how they choose the features for the models. Despite the divergent […]
Read MoreRight to Forget
General Data Protection Regulation (GDPR) Any organization that collects and stores EU resident data is subject to General Data Protection Regulation (GDPR). Examples of such organizations include Google, Facebook, and Amazon. The regulation places the obligation for responsible data handling with such organizations, and gives individuals a number of rights. All major geographies now have GDPR-like regulation […]
Read MoreScribble Data raises funding to scale feature store
We are thrilled to announce that we’ve just closed our first round of funding to help us scale and deliver our Feature Store product, Enrich, in international markets for enterprise-grade Machine Learning products. Our investors are data-driven leaders from companies like Google and Amazon, from the US and India. Scribble Enrich, our feature store […]
Read MoreShould Data Scientists Be Excited Or Worried About The New Privacy Laws?
The General Data Protection Regulation (GDPR) legislated and passed by the European Union has sent ripples around the world, and depending on who you ask, this could either spell apocalypse, the workings of a nanny state, or a very positive step towards consumer privacy. The direct objective of such a ruling is to give control […]
Read MoreWhy your business doesn’t have to wait, to start giving back
“If you’re in the luckiest 1% of humanity, you owe it to the rest of humanity to think about the other 99%.” — Warren Buffett W.B. has given away more than he has left. In fact, he has pledged to give 99% of his wealth. It gives us pause. When talk of CSR and philanthropy are […]
Read MoreHow to get the most out of your organization’s data: The mindset
Every business is a data business And while this aphorism has been around for some time, what does this actually mean to enterprise stakeholders? What should key decision makers be valuing and excited about as they start to invest in analytics tools and ML/AI? Here’s what we think are the most important aspects to embrace […]
Read MoreReducing Organizational Data Infrastructure Costs
We speak to a number of organizations who are in the process of building and deploying data infrastructure and analytical processes. Organizations face a number of challenges that prevent them from meeting their analytical business objectives. The idea of this note is to share our thoughts on one specific challenge – high cost. Specifically: Cost […]
Read MoreHow to Turn Your Startup Into a Data Informed Business
This post is a useful way to think about how to start on a data journey if you’re a young startup that’s just pushed data to the back burner (say until you had ‘enough’ traction) or even if you’re part of a more mature company that’s used to making decisions more on instinct and experience, […]
Read MoreThe Pitfalls of Data Science and how you can avoid them
[Update]: This article is getting a good bit of engagement. If it resonates with you, I’d love it if you could answer a short 2 minute survey on your data journey here. I will add the same survey link at the end of this post as well. Depending on who you ask, you’re going to hear data […]
Read MoreHow to Architect for Data Consumption
This is my pet peeve – technical architects are building systems and applications that make data analysis complicated, error-prone, and inefficient. We need enablement of data consumption as a first-class requirement of any system that is built. I explain here how we could architect differently to improve data consumption. Technical systems architects, including myself until […]
Read MoreWhat Can We Do With Metadata?
As the complexity of data and systems that hold data grows, the cost of analysis increases due to time and effort spent in figuring out the feasibility, appropriateness, access, and management of data. We believe that a number of new low-risk and valuable applications can be built through creative application of metadata that can help […]
Read MoreData Shifts Power Within Organizations
A major challenge in going more data-driven in organization has less to do with data itself, and more to do with the ability to manage the dynamics that emerge as decision makers look at data as an input to decision process. I have a particular kind of power shift in mind. I am not referring […]
Read MoreAvailable But Unusable Data – Part II – Semantic Gaps
At Scribble Data we are thinking deeply about why decision makers are not able to get to the data when they need even when relevant data is available in their own databases. The reason this question matters is because we find that decision makers routinely make high risk decisions involving products, marketing, and operations with […]
Read More