As AI Races Ahead, KNIME Ensures You Can Still Look Back
As the AI Act goes into effect in Europe, companies around the world are waking up to the reality that they must get serious about AI governance. The good news for organizations that rely on data science software from KNIME is that AI governance is baked right into the open source suite.
August 1 marked the start of enforcement for the European Union’s AI Act, a landmark law that implements wide-ranging regulations into the use of AI across the continent. In addition to banning some types of AI and requiring companies to seek government permission for others, the AI Act also mandates that organizations adopt AI governance principles.
“Compliance with the AI Act will require a holistic end-to-end AI governance framework that allows you to triage AI projects by risk category,” PwC writes on its website, “implement the appropriate steps for compliance at the right time, record compliance steps in an auditable manner and continuously review and improve governance to reflect new regulatory developments.”
While some people may find these requirements onerous for data science, others welcome the newfound focus on governance, tracking your work, and repeatability. You can count Michael Berthold, the co-founder and CEO of KNIME, in the latter category.
“What surprises me is, honestly, I thought that was always important,” Berthold told Datanami last week. “We’ve always had these governance kinds of features as part of KNIME, because the moment you start doing anything with data, you need to make sure it doesn’t get sent all over the place.”
Compatible Origins
Berthold started developing KNIME back in 2006 to support the development of data science applications at the South San Francisco pharmaceutical software company he worked at. Instead of developing a monolithic data science product, he envisioned a flexible workbench that allows users to visually design data science jobs that require multiple data science tools, and then executes the jobs as part of an integrated workflow.
Today, KNIME is composed of more than 4,000 open-source tools, many developed by the community. The KNIME interface is standardized and well tested, so if a change is made to one of the many KNIME components that breaks compatibility, it will throw up errors during validation testing.
While Berthold doesn’t control what people build with KNIME, he wants to be sure they can trust that the software is reliable and gives consistent answers.
“Our testing environment still has workflows that were developed with KNIME version .10,” Berthold said. “They still run. Even better, they not only run, but they also produce the same results.
“It’s better that it crashes than producing different results,” the German computer scientist added.
Rise of GenAI
KNIME just released version 5.3, which brought, among other features, new generative AI capabilities. Like other data science toolmakers, KNIME is embracing large language models (LLM) not only as a computational tool for data scientists building GenAI applications, but to improve the product experience directly within KNIME as well.
“There are more modules around nodes that allow you to reach out to LLMs, use them, customize them,” Berthold said. “So you can connect to OpenAI. You can connect to Hugging Face models. You can use that to either augment…your analytical workflow with GenAI methods, or you can really build safeguarding wrappers around GenAI.”
The product also sports copilots that help data scientists develop KNIME workflows faster by giving them recommendations on what to do, he said. “We also have, for our Python and R and SQL integration, AI assistants that help you write that code,” he said.
While GenAI will undoubtedly provide a productivity boost to KNIME users and the customers that use products KNIME developers build, they can be satisfied that KNIME won’t be cutting corners when it comes to AI governance requirements, such as traceability and explainability.
“Especially if your government regulations essentially require you to be able to explain what you did years later, then you need an environment that is 100% backwards compatible,” Berthold said. “Otherwise, you simply can’t use it for business-sensitive applications.”
Trusted Environments
Ensuring backwards compatibility with more than 4,000 open source tools that you don’t fully control is no easy task. It’s also not 100% foolproof.
If the data scientist is developing an AI application that is in the higher risk category of the AI Act, for instance, they probably want to use KNIME’s trusted extensions, which meet more stringent requirements than the less-trusted extensions available from the community.
“We don’t call them untrusted,” Berthold said, “but they’re not part of the trusted category. And those are the ones that we also tell our users ‘Hey, play with this, this is great stuff.’ But if somebody wants to move into production, we recommend that you stay away from those extensions because we can’t guarantee that a year later they’re still supported.”
Python is the most widely used language for data science at the moment, and it’s one of two languages that KNIME allows you to build extensions with (Java being the other). But there is also the potential in Python to go off the governance track and into dangerous waters. Specifically, custom KNIME extensions written in Python can reach out and load arbitrary libraries from the Internet, Berthold said, which is a recipe for unknown disasters.
“I don’t mind Python as a language,” he said. “I mind Python as trying to build reproducible data science workflows. That’s where it scares me. If you can keep it under control [it’s fine]. But if you let people do what they want to do, it’s scary.”
Governance Galore
KNIME 5.3 introduced more governance capabilities in the KNIME Hub, the enterprise version of its software that is used by more than 400 organizations around the world. Governance features aren’t sexy, but they are starting to move the needle among paying customers.
“We had these validation and governance features for a long time as part of the Hub,” Berthold said. “Now suddenly people are starting to pay attention.”
This week, the Zurich, Switzerland-based company announced a $30 million round of funding from Invus, its longtime investor. The company says it will use the funding to fuel innovation and to “double down” on enterprise-grade AI governance and ModelOps.
“KNIME enables fast, widespread data science and AI adoption, with centralized governance and ModelOps capabilities,” said Invus Managing Director Mario Kaloustian said in a press release. “This model answers the key question for the C-suite of every company: How do we accelerate innovation with GenAI while managing associated risks?”
AI and data governance will feature prominently in the KNIME 5.4 release, which is slated for December 6 (Saint Nikolaus Day in Germany). The company is working on anonymization and de-anonymization features to safeguard GenAI calls, as well as more governance capabilities to make sure the wrong group doesn’t use the wrong LLM, Berthold said.
“We’re working quite a bit on the governance piece of AI,” he added.
Related Items:
Five Questions as the EU AI Act Goes Into Effect
KNIME Works to Lower Barriers to Big Data Analytics