Reddit Sues Perplexity: What the AI Data Scraping Lawsuit Means for Publishers

A federal lawsuit filed in October 2025 represents a pivotal moment in the debate over AI training data and could reshape how content is valued on the web.

A Pivotal Moment for Content and AI

On October 22, 2025, Reddit filed a federal lawsuit against AI search startup Perplexity and three data-scraping companies. The case represents a watershed moment in the ongoing debate over how AI companies access and use web content for training their systems.

While Reddit has signed licensing agreements with OpenAI and Google, Perplexity allegedly chose a different path--using intermediaries to scrape content after receiving a cease-and-desist order. This lawsuit could establish precedents that affect every content creator, publisher, and SEO professional navigating the evolving search landscape.

Understanding these dynamics is essential for anyone involved in content marketing or search engine optimization, as the outcomes may fundamentally reshape how content is valued and monetized in the AI era.

The Lawsuit: Reddit Takes on Perplexity and Data Scrapers

Key Players in the Case

Reddit filed the lawsuit in New York federal court, naming four defendants:

Defendant	Description	Role in Alleged Scheme
Perplexity	AI search startup valued at $9 billion	Primary defendant, allegedly using scraped data
SerpApi	Texas-based data scraping company	Sold access to scraped search results
Oxylabs UAB	Lithuania-based proxy service	Provided residential proxies to mask scraping
AWM Proxy	Former Russian botnet	Facilitated automated content extraction

According to Reuters reporting on the lawsuit, Reddit's legal team drew a compelling analogy: the defendants were "like would-be bank robbers, who, knowing they cannot get into the bank vault, break into the armored truck carrying the cash instead."

The Core Allegations

According to the complaint:

Perplexity received a cease-and-desist letter in May 2024 but continued scraping
The company allegedly circumvented Google's anti-scraping protections by going through search results
Instead of accessing Reddit directly, Perplexity allegedly pulled content from Google's cached pages
Reddit documented a 40-fold increase in its content citations after the cease-and-desist

This case follows a broader pattern of publishers taking action against unauthorized AI data usage. For more context on how AI is reshaping search, see our guide on how AI-powered search is reshaping SEO.

How Reddit Proved the Scraping

The Honeypot Test

To demonstrate that Perplexity was accessing its content through circumvention, Reddit employed a clever detection method:

Created invisible content: Reddit posted information that was only visible to Google's crawlers
Monitored Perplexity results: Within hours, this Google-exclusive content appeared in Perplexity search results
Established causation: This proved Perplexity was pulling data from Google's index rather than accessing Reddit directly

The 40-Fold Citation Increase

Perhaps most damning was Reddit's documentation of its citation trends:

Before May 2024: Baseline level of Reddit citations on Perplexity
After cease-and-desist: 40x increase in Reddit content appearing in Perplexity responses
Implication: Rather than complying with the legal demand to stop, Perplexity allegedly doubled down on extraction

This evidence suggests willful infringement rather than accidental or incidental scraping. The case highlights why technical SEO measures like robots.txt and proper site architecture matter--not just for search rankings, but for protecting content assets.

The Data Licensing Economics

Reddit's Strategic Moves

Reddit recognized the value of its data early:

2023 realization: Reddit understood its 20-year archive of user conversations was a goldmine for AI training
Licensing pivot: The company began blocking unauthorized access and broking deals with major AI players
OpenAI agreement: First major licensing deal established terms for data access
Google partnership: Additional deal further validated Reddit's data valuation

As analyzed by Built In's coverage of the lawsuit, these deals established an industry precedent: AI companies should pay for content access.

Why Perplexity Became a Target

Unlike OpenAI and Google, Perplexity allegedly refused to negotiate:

Perplexity's defense: The company claims it only summarizes and cites Reddit discussions, noting that this content is publicly available and that it doesn't train foundation models.

Reddit's counter: The company argues that even summarization requires licensing, and that Perplexity's indirect scraping through Google constitutes circumvention of access controls.

Industry-Wide Precedent

This lawsuit is part of a broader pattern of legal action:

The New York Times sued OpenAI and Microsoft over copyright
Dow Jones and News Corp filed similar claims against Perplexity
Getty Images pursued legal action over image generation training data
Meta, X, and LinkedIn have all taken action against data scrapers

Reddit's case could establish important precedents for all of these disputes, potentially reshaping how content creators and publishers are compensated for AI training data. To stay ahead of these changes, publishers should build comprehensive SEO strategies that adapt to evolving search technologies.

Impact on Publishers and the Search Ecosystem

AI Overviews Traffic Decline

Google's AI Overviews have fundamentally changed search traffic patterns. According to Similarweb's research on generative AI's impact on publishers:

Metric	Before AI Overviews	After AI Overviews	Change
No-click searches	56%	69%	+13%
Monthly organic visits	2.3 billion	1.7 billion	-26%
Publisher referral traffic	Baseline	Significantly reduced	Declining

Reddit's Unique Position

Reddit has emerged as the most-cited source in AI-generated content:

Top source on Perplexity: Reddit conversations appear most frequently in AI search responses
Top source on Google AI Overviews: Reddit content is heavily featured in Google's AI summaries
Second-most cited on ChatGPT: Only behind Wikipedia in AI training citations

This prominence makes Reddit both a target for scrapers and a powerful advocate for content licensing. The shift in how users discover content through AI interfaces underscores the importance of building brand authority that transcends any single traffic source.

The Broader Trend

Publishers are seeing their content:

Summarized by AI: Users get answers without visiting source pages
Repackaged as AI output: Original attribution often minimized or lost
Removed from click funnels: Traffic funnels through AI intermediaries instead

The economic model that sustained digital publishing for two decades is under unprecedented pressure, making it essential for content creators to adapt their content strategies accordingly.

What This Means for Content Professionals

Key insights for SEO professionals, content creators, and publishers navigating this evolving landscape

Data Licensing Is the New Standard

Content platforms are increasingly pursuing licensing deals with AI companies. Expect more platforms to follow Reddit's lead in monetizing their data assets.

Anti-Scraping Protections Matter

The case highlights the importance of robots.txt, rate limiting, and other technical measures. Platforms that don't protect their content have less legal standing.

Attribution Standards Are Evolving

As AI-generated summaries become more common, the industry may develop new standards for citing and linking to original sources.

SEO Strategy Must Adapt

With AI overviews changing click dynamics, traditional SEO metrics need reevaluation. Brand authority and direct traffic become more valuable.

Frequently Asked Questions

Why is Reddit suing Perplexity specifically?

Reddit alleges that Perplexity used intermediaries to scrape its content after receiving a cease-and-desist order in May 2024. Unlike OpenAI and Google, which negotiated licensing agreements, Perplexity allegedly continued accessing Reddit data through circumvention methods.

What is SerpApi's role in the lawsuit?

SerpApi is named as a defendant for allegedly providing Perplexity with access to scraped search data. The lawsuit claims SerpApi helped circumvent Google's anti-scraping protections by selling API access to search results that contained Reddit content.

How did Reddit prove the alleged scraping?

Reddit created a 'honeypot' test by posting content only visible to Google's crawlers. When that content appeared in Perplexity search results within hours, it proved Perplexity was pulling data from Google's index rather than accessing Reddit directly.

What is the potential impact on publishers?

If Reddit succeeds, it could establish a legal precedent requiring AI companies to pay for content access. This could fundamentally reshape the economics of content creation and force AI companies to negotiate with publishers rather than scraping freely.

How does this affect my SEO strategy?

AI overviews are changing how users interact with search results, with fewer clicks to publisher sites. Content professionals should focus on building brand authority, direct audience relationships, and understanding how AI systems cite and summarize content.

Will this stop AI companies from using web content?

Unlikely. More probable is a shift toward licensed data partnerships. Companies like OpenAI and Google have already negotiated deals. The lawsuit may accelerate this trend by establishing that unauthorized scraping carries significant legal risk.

Navigate the Evolving Search Landscape with Confidence

As AI transforms how content is discovered and valued, having the right SEO strategy matters more than ever. Our data-driven approach helps you adapt to changing algorithms and maintain sustainable organic growth.