In our last article, we introduced Sub-ML use cases and discussed how their number is growing. In this article, we’ll try and understand how purpose built feature stores for solving Sub-ML use cases can help drive more value with data.
Data Science as a discipline has seen the kind of evolution that only few others have. What started as a combination of applied statistics and computer science has enabled powerful data-driven decision making and predictions to solve real world problems. To illustrate this further, think of all the complex Data Warehousing and Business Intelligence solutions companies use to mine data and analyze and visualize results for decision-makers. With time, the size and number of data sets increased, complexities grew, and practitioners shifted to Big Data and Machine Learning.
Today, we see Business Intelligence (BI) and Machine Learning (ML) coexist as functions across data-driven organizations. However, although the end goal of each function is to provide users with actionable inputs for decision-making with their data, their end-consumers, implementation complexity, and design practices are remarkably different. Furthermore, each function comes with its own set of challenges:
BI tools can only handle a limited amount of complexity and size of datasets. Their scope is limited when it comes to providing solutions to use cases with increasing degrees of complexity, primarily focused on decision making (e.g., credit risk, benchmarking, recommendations, RFM).
ML engineers and data scientists are expected to solve the challenge of decision-making. But, due to the lack of bandwidth (and unavailability of data science talent, not to mention the lack of prepared datasets taking up over 80% of the time), these use cases are often put on the backburner.
These challenges result in a backlog of questions, data scientists’ fatigue, and business stakeholders’ reluctance to adapt to a technological change. Although both ML and BI, which are on opposite ends of the spectrum, can orchestrate incredible gains, they can also be highly inefficient. The question becomes – is there a solution to overcome these inefficiencies?