Follow BigDATAwire:

April 18, 2022

Anaconda’s Commercial Fee Is Paying Off, CEO Says

Anaconda created a stir over a year ago when it began charging large commercial users a fee for access to its popular collection of Python tools. The change, it said, was necessary to offset the costs of maintaining a large and increasingly complex body of open source software. While it upset some users, the move is paying off, Anaconda CEO Peter Wang tells Datanami.

Since 2012, Anaconda has been providing one of the most popular Python packages in the data science community popular. Millions of users around the world have downloaded and used the Anaconda Distribution, which contains hundreds of individual Python tools, such as NumPy, Pandas, SciPy, and more (R is also included in the distro, but Anaconda is closely associated with Python).

Getting all of these open source packages to play nicely together is not an easy task, especially as the packages and their underlying dependencies change over time. The Austin, Texas company addresses this challenge with its Conda package manager, as well as through the efforts of dozens of Anaconda engineers and upstream volunteers at the open source projects themselves.

For years, Anaconda’s business model relied on profits from its commercial data science platform business to help sustain the work it did to maintain the free and open source Anaconda Distribution. However, thanks to skyrocketing adoption of Python–not to mention the additional costs of certifying Anaconda on new hardware platforms such as Arm and GPUs and addressing the demands of cloud platforms–the financial balance became untenable, according to Wang, a 2020 Datanami Person to Watch.

The old business model changed in April 2020, when Anaconda tweaked its terms of service to ask “heavy commercial users” to pay $15 per user per month for access to the package.

“We said, look, we’re at a point now where the adoption of Python is so massive, the cost of supporting package builds and integration of all these things–it actually is significant,” Wang says. “It costs us real money. We need to make a little bit of money back on it so we can support it.”

The Anaconda Distribution contains about 250 open s ource packages, while the Anaconda Repository has access to about 8,000 more

However, Anaconda did something interesting with that April 2020 change: It didn’t specify what “heavy commercial use” actually meant. The company decided to rely on the honor system because it was sensitive to its community and didn’t want to upset it, Wang says. A few companies paid what Anaconda asked for, but most didn’t.

Anaconda called some of the heavy users of the free and open source software, and tried a slightly more direct approach. “We call them, and the data scientists say ‘Oh yeah, of course, we love your stuff. We need you to make our deployment sane. This is great,’” Wang says.

But when the sales conversations went up the ladder to the IT managers and eventually the legal team, things got a little less rosy.

“Legal is like ‘We don’t have to pay, right?’ ‘Nope?’” came the response from the Anaconda representative. “‘OK, we’re done. We’re not making heavy commercial use. That’s a fuzzy term. Our lawyer will tell you why it’s unenforceable.’”

In October 2020, the company decided to add some teeth to its terms of service. Heavy commercial use was defined as a commercial entity with 200 or more users using the software on a regular basis. That left a large number of users unaffected, Wang says.

“If you’re a small business, if you’re a non-profit, if you’re an academic research facility, it doesn’t apply to you. If you’re a startup and you have 150 people and every single one of them is actively doing work in Jupyter notebooks, doing Dask things using open source, using our repo–it doesn’t matter,” he says.

“What it really affects is the really giant big companies–the massive banks, the massive industrial companies,” he continues. “They have thousands of users. They’re hitting our repositories all the time. Sometimes they have deployments go wrong and they slam and DOS [denial of service] our repositories. We just want all of those guys to actually shift over to using our commercial repository.”

Not everybody is happy with the change to the terms of service. Some accuse Anaconda of using bait-and-switch techniques, of trying to monetize the hard work of other open source developers. Others have threatened to abandon Anaconda and use other Python packages, including Conda Forge, a GitHub-based community that distributes individual components in the Conda package manager.

Anaconda co-founder and CEO Peter Wang

That last one gets a chuckle out of Wang. “They don’t realize Conda Forge is hosted on our infrastructure,” he says. “I pay $80,000 to $100,000 [per month] to support the download volume of Conda Forge and the software infrastructure.”

Besides, Wang says, Conda Forge is focused on the individual “recipes,” not necessarily putting out a distribution where hundreds of statistical products work together.

“They don’t really make a strong opinion on a whole collection,” he says of Conda Forge. “What we actually step up and do is a curation aspect around a collection of packages, making sure they work together, making sure we feel good about the release, that it goes out to millions of students and actually works. That is nontrivial work.”

For the same reason that companies pay for a distribution of Linux, which ostensibly is a “free” operating system, companies should be willing to pay for a collection of statistical software that works well together.

“Why doesn’t everybody just build Linux from scratch? It’s all open source?” Wang asks. “Well, the reason is because there’s a lot of work to make hundreds of pieces open source work together. And so when we do the same thing, when we build a distribution, there’s value there. A lot of people really get value from that. And so that takes time, that takes energy, and money. So yeah, it’s millions and millions and millions of dollars every single year that we spend to do all this stuff.”

Wang says he pays a team of 30 software engineers and CI/CD experts to work on the recipes, the Conda package manager, and interacting with the upstream project manager to keep the software running. (A similar size team works on the commercial data science platform side, and there’s another 25 people devoted to “raw open source innovation). A lot of work these days is done around security and creating a software bill of materials (BOM), which is a new industry push designed to thwart supply chain software hacks, such as the Heartbleed vulnerability in OpenSSL in 2020. “It’s very significant,” he said. “It’s a lot of people.”

Maintaining important open source projects can be a thankless job (Image courtesy XKCD)

Wang recalls a conversation he had with what he calls “a major cloud vendor” about getting their heavy usage.  Upon being asked to pay for using Anaconda Distribution, a product manager at the cloud vendor told Wang:

“Oh, well, we have engineers. Why would we need to use you? This all open source. We’ll just take your stuff, which we’ll fork. We’ll take the Conda Forge recipes And we’re going to go and build a [fork] to the Conda distribution and we’ll do it for, like, $2 million or $3 million a year,” Wang says.

“And I literally looked at the PM in the eye and I said ‘If you can do that for $2 million or $3 million a year, I will license it from you,” he says. “I will OEM your thing. Because you’re doing it cheaper than I am! So let me know when you get that stood up and I’ll kill off the Anaconda team and just license your thing. Because that’s laughable to think that you can just do that for a couple of million per year.”

The good news, at least for Anaconda and the future of open source software, is that the large industrial users are starting to pay for Anaconda. “We are starting to actually make the money back,” Wang says. “We actually are. And it’s been really, really great.”

In addition to paying for the increased costs of maintaining the Anaconda Distribution, the fee for heavy users is allowing Anaconda to give tens of thousands of dollars in dividends to NumFOCUS, which in turn is distributing the funds to various upstream open source projects, Wang says.

“So even off the pure commercial licensing stuff alone, we’ve been able to provide a net dividend for the broader open source community, and I’m hoping that that will continue to grow over time,” he says.

The long-term goal is to broaden the user base for the Anaconda products. Instead of having an all-or-nothing approach–either use the open source tools or buy the full data science platform—the company will give individual customers more options for getting enterprise-type features, such as security and access to cloud compute.

While Anaconda’s new fee represented an end to free statistical software for large commercial users, it’s a win for the folks who put in long hours to keep it working.

“The Python community is coming at this form I think maybe too limited of a perspective in terms of what it takes to sustain the ecosystem for the long run,” Wang says. “People just don’t think about it. So every time someone PyPI installs, Conda installs something, if they actually think about how did this free value show up? There’s a lot of maintainers working on nights and weekends. And that’s not sustainable. And I don’t think anyone really wants that either.”

Related Items:

Quansight Tackles Support Gap in Python Data Community

Open Source Still Rolling, But Roadblocks Loom

Why Anaconda’s Data Science Tent Is So Big–And Getting Bigger

BigDATAwire