Feature Stores Emerging as Must-Have Tech for Machine Learning
Machine learning may be eating software, but it looks as though feature stores may be eating machine learning. In the rush to develop and roll machine learning applications into production, organizations are finding feature stores to be the missing link between dreaming about suceeding with machine learning at scale and actually achieving it.
The feature store is a critical piece of data engineering middleware that links the past (i.e. training machine learning models on historical data) with the current (i.e. running inference on real data in the present). Feature stores typically hold the code that describes the feature, as well as the actual feature values. Put another way, it’s where data science models and real-world data actually meet.
Feature stores are where the all-important features of the machine learning model–or the input signals that data scientists spent hours identifying and meticulously honing for their various recommenders, fraud detection, or personalization systems–are housed before they are compared to the fresh, incoming data during inference. It combines elements of various disciplines (including data science and data engineering) and typically involves deployment of several pieces of technology, such as ETL pipelines, key-value stores, and MLOps control planes.
As the number of features in the machine learning model goes up, so do does the complexity in managing the overall feature store pipeline. Many companies are finding the complexity too great to handle with homegrown systems, which is driving a mini-boom in shrink-wrapped feature store solutions. Those are being developed almost entirely by cloud startups, although the cloud bigwigs are getting into the act, too.
One of those feature store startups is Tecton, which was founded by the folks who developed Uber’s feature store, which was one component of its Michelangelo machine learning system. Earlier this year, the company’s co-founder and CEO, Mike Del Balso, predicted to Datanami that 2021 would “be the year of the feature store.” So far, it appears that he’s right.
Feature store vendors like Tecton will be participating in next week’s inaugural Feature Store Summit. The two-day event, slated for October 12 and 13, is being hosted by Hopsworks, a feature store vendor that develops “the industry leading feature store.” Hopsworks’ offering, like many in the emerging space, is open source.
Feature Store Summit will feature talks by some of the industry’s top feature store developers. For instance, Spotify engineers will talk about the feature store they developed for their music streaming service. Salesforce engineers will also talk about how they created a feature store for their Salesforce ML platform. Other talks will be given by engineers from Twitter, Iguazio, Amazon, Databricks, Kaskada, Redis, and Microsoft.
Amazon subsidiary Amazon Web Services got into the feature store business in late 2020 as a complement to SageMaker. As an end-to-end machine learning environment, Amazon SageMaker has proven to be extremely popular, and is used by a large number of data scientists for building and deploying machine learning models.
One of the companies hoping to parlay SageMaker’s popularity with data scientists into sales for its machine learning feature store is Cloudian. While the company ostensibly is a developer of an extremely scalable object storage system used to store petabytes of unstructured data, its adherence to Amazon’s S3 storage protocol is providing entry into the world of feature stores, explains Gary Ogasawara, Cloudian’s chief technology officer.
“We keep a close eye on AWS and where they’re getting a lot of interest in and business, and how that can apply to us, and where we want to apply our software across the edge, the data center, and cloud,” Ogasawara tells Datanami.
The company is currently conducting beta tests for its new Streaming Feature Store, which combines Apache Flink for streaming data ingest and the Redis key-value store for high-speed data lookups. Through its support for the S3 API, its hoping to attract SageMaker Feature Store customers who want a more responsive experience, on-prem experience than what AWS can offer.
“If they want to use AWS SageMaker [Feature Store] they have to go all the way up to the cloud and back, Ogasawara says. “If you have something on premises, it can be much quicker and cheaper.”
Cloudian clearly senses an opportunity to sell feature stores to its 700-odd clients. In the current feature store boom, there are many open source offerings being created, but they lack a common standard. Cloudian is throwing in its lot in with AWS and betting that S3 will become that standard.
“We haven’t seen that consolidation in the market or in the API around that,” says Ogasawara, who wrote a recent blog post on the company’s Streaming Feature Store. “Now for us, we feel we feel strongly that AWS SageMaker Feature Store API is the way to go. There is a built-in customer base that have been using the API ,vetting it, using it under stress. And then as that API develops, we could continue to develop along with it and build more and more features.”
Related Items:
2021: The Year of the Feature Store
AWS Bolsters SageMaker with Data Prep, a Feature Store, and Pipelines
Object Stores Starting to Look Like Databases
Editor’s note: Tecton will not be presenting at the upcoming Feature Store Summit.