Follow BigDATAwire:

July 8, 2024

Cloudflare Rolls Out New Feature For Blocking AI Bots

With the rise of GenAI, the demand for data has increased dramatically, making it more valuable than ever. In the current digital era, website owners face the significant challenge of keeping their data safe from AI bots scraping their content without permission. 

AI companies often use content from public websites to train their large language models (LLMs). While some larger companies such as Google and OpenAI offer website operators to opt out of scraping, not all LLM developers are that transparent. This issue of web scrapping was highlighted a few months ago when Reddit struck a $60m deal with Google to allow the search giant to train its AI models on its posts.

To address this challenge, Cloudflare, one of the leading web infrastructure and security firms, has introduced a new no-code feature that protects website content from poaching by data-harvesting bots. With the new tool, web hosting customers can now block AI bots, also known as AI scrappers or crawlers, with just a single click. 

To activate the new tool, users can navigate to the Security section and toggle the “AI Scrapers and Crawlers” switch. The new feature is available on the free and paid version of Cloudflare’s content delivery network (CDN).

The launch of the new feature by Cloudflare comes at a time when there are some mixed opinions in the industry about what is considered as “fair use” for publicly available content on websites. 

During a recent interview at the Aspen Ideas Festival, Mustafa Suleyman, the CEO of Microsoft’s AI division, sparked controversy by suggesting that all public website content should be considered freeware for AI training purposes. 

Media publishers and content hosting platforms would tend to disagree with Suleyman. These users now have a defensive weapon against the AI bots in the form of Cloudflare’s new tool that can detect and block automated content extraction attempts by AI bots. 

AI bots often scrape websites in a manner that makes them appear like regular user traffic. Cloudflare claims that its new feature has advanced capabilities to identify bots designed to avoid detection. 

“Sadly, we’ve observed bot operators attempt to appear as though they are a real browser by using a spoofed user agent,” shared Cloudflare engineers in a blog post. “We’ve monitored this activity over time, and we’re proud to say that our global machine learning model has always recognized this activity as a bot, even when operators lie about their user agent.” 

(Stokkete/Shutterstock)

Cloudflare is aware of the ability of AI companies to develop new methods to scrape websites, and to overcome this challenge, the company plans on regularly updating the new feature. In addition, Cloudflare has its ML model to “fingerprint” bots attempting to scrape or crawl websites, allowing it to flag traffic from evasive AI bots. 

Powering nearly 20% of all web traffic, Cloudflare holds a significant market share in the web performance and security industry. The company also entered the observability market earlier this year with the acquisition of Baselime, the cloud-native observability platform. 

The roll-out of the new AI bot-blocking feature marks a significant step forward for Cloudflare in its battle against unauthorized web scraping by AI developers. It enhances Cloudflare’s appeal to customers seeking greater control over access to their website’s data. 

Related Items 

Cloudflare Announces Major Updates for R2 Including Event Notifications and GCS Support

Data Management Implications for Generative AI

How Companies Are Using Bots in Data Management

 

 

 

BigDATAwire