AI Startups

Perplexity Caught Using Stealth Bots to Bypass Website Blocks

Perplexity’s stealth activity occurred across tens of thousands of domains, involving millions of requests per day.

The Left Shift Bureau

05 Aug 2025 — 1 min read

Cloudflare has unveiled evidence that Perplexity AI—a conversational search engine—engaged in stealth crawling, bypassing standard web constraints intended to block automated access.

"Although Perplexity initially crawls from their declared user agent, when they are presented with a network block, they appear to obscure their crawling identity in an attempt to circumvent the website’s preferences," Cloudflare said in a blog post.

Despite websites explicitly disallowing PerplexityBot and Perplexity‑User via robots.txt and WAF rules, Perplexity reportedly deployed undisclosed crawlers, rotating IP addresses and Autonomous System Numbers (ASNs), and masquerading as legitimate browsers like Chrome on macOS to access restricted content.

Cloudflare conducted tests that involved querying Perplexity AI about freshly registered domains that were neither indexed nor publicly exposed. Even though robots.txt files instructed all crawlers to stay out, Perplexity still returned detailed content responses—proof it had succeeded in scraping the otherwise inaccessible pages.

According to the report, Perplexity’s stealth activity occurred across tens of thousands of domains, involving millions of requests per day. Cloudflare removed Perplexity from its list of verified bots and implemented heuristic-based blocking to prevent further stealth scraping.

The incident highlights ongoing controversy over AI companies’ compliance with the Robots Exclusion Protocol. Earlier investigations from Wired, Forbes, and others accused Perplexity of ignoring these directives and scraping content from paywalled or blocked sites, prompting legal scrutiny from publishers like Dow Jones and The New York Times.

Earlier this year, Cloudflare became the first internet infrastructure provider to block AI crawlers by default — unless explicit permission or compensation is given by content owners.

This move marks a significant step toward a more controlled and fair digital ecosystem for publishers and creators.

Previously, AI crawlers could scrape vast amounts of online content without prior consent. Now, new Cloudflare customers will start with AI training crawlers blocked by default, shifting the model from opt-out to opt-in. Existing users can enable this feature with a single click on their dashboard.

Perplexity Caught Using Stealth Bots to Bypass Website Blocks

The Left Shift Bureau

Read more

Acronis Launches 24/7 MDR Service to Help MSPs Scale Cybersecurity Without In-House SOC

Cloudflare and GoDaddy Roll Out New Controls for Bots and AI Agents

Cyient Semiconductors Acquires 74% Stake in Kinetic Technologies for $85 Mn

Indian Startup Rocket Launches ‘Vibecoding’ Platform to Transform AI Product Building