A Pivotal Moment for Content and AI
On October 22, 2025, Reddit filed a federal lawsuit against AI search startup Perplexity and three data-scraping companies. The case represents a watershed moment in the ongoing debate over how AI companies access and use web content for training their systems.
While Reddit has signed licensing agreements with OpenAI and Google, Perplexity allegedly chose a different path--using intermediaries to scrape content after receiving a cease-and-desist order. This lawsuit could establish precedents that affect every content creator, publisher, and SEO professional navigating the evolving search landscape.
Understanding these dynamics is essential for anyone involved in content marketing or search engine optimization, as the outcomes may fundamentally reshape how content is valued and monetized in the AI era.
The Lawsuit: Reddit Takes on Perplexity and Data Scrapers
Key Players in the Case
Reddit filed the lawsuit in New York federal court, naming four defendants:
| Defendant | Description | Role in Alleged Scheme |
|---|---|---|
| Perplexity | AI search startup valued at $9 billion | Primary defendant, allegedly using scraped data |
| SerpApi | Texas-based data scraping company | Sold access to scraped search results |
| Oxylabs UAB | Lithuania-based proxy service | Provided residential proxies to mask scraping |
| AWM Proxy | Former Russian botnet | Facilitated automated content extraction |
According to Reuters reporting on the lawsuit, Reddit's legal team drew a compelling analogy: the defendants were "like would-be bank robbers, who, knowing they cannot get into the bank vault, break into the armored truck carrying the cash instead."
The Core Allegations
According to the complaint:
- Perplexity received a cease-and-desist letter in May 2024 but continued scraping
- The company allegedly circumvented Google's anti-scraping protections by going through search results
- Instead of accessing Reddit directly, Perplexity allegedly pulled content from Google's cached pages
- Reddit documented a 40-fold increase in its content citations after the cease-and-desist
This case follows a broader pattern of publishers taking action against unauthorized AI data usage. For more context on how AI is reshaping search, see our guide on how AI-powered search is reshaping SEO.
How Reddit Proved the Scraping
The Honeypot Test
To demonstrate that Perplexity was accessing its content through circumvention, Reddit employed a clever detection method:
- Created invisible content: Reddit posted information that was only visible to Google's crawlers
- Monitored Perplexity results: Within hours, this Google-exclusive content appeared in Perplexity search results
- Established causation: This proved Perplexity was pulling data from Google's index rather than accessing Reddit directly
The 40-Fold Citation Increase
Perhaps most damning was Reddit's documentation of its citation trends:
- Before May 2024: Baseline level of Reddit citations on Perplexity
- After cease-and-desist: 40x increase in Reddit content appearing in Perplexity responses
- Implication: Rather than complying with the legal demand to stop, Perplexity allegedly doubled down on extraction
This evidence suggests willful infringement rather than accidental or incidental scraping. The case highlights why technical SEO measures like robots.txt and proper site architecture matter--not just for search rankings, but for protecting content assets.
The Data Licensing Economics
Reddit's Strategic Moves
Reddit recognized the value of its data early:
- 2023 realization: Reddit understood its 20-year archive of user conversations was a goldmine for AI training
- Licensing pivot: The company began blocking unauthorized access and broking deals with major AI players
- OpenAI agreement: First major licensing deal established terms for data access
- Google partnership: Additional deal further validated Reddit's data valuation
As analyzed by Built In's coverage of the lawsuit, these deals established an industry precedent: AI companies should pay for content access.
Why Perplexity Became a Target
Unlike OpenAI and Google, Perplexity allegedly refused to negotiate:
Perplexity's defense: The company claims it only summarizes and cites Reddit discussions, noting that this content is publicly available and that it doesn't train foundation models.
Reddit's counter: The company argues that even summarization requires licensing, and that Perplexity's indirect scraping through Google constitutes circumvention of access controls.
Industry-Wide Precedent
This lawsuit is part of a broader pattern of legal action:
- The New York Times sued OpenAI and Microsoft over copyright
- Dow Jones and News Corp filed similar claims against Perplexity
- Getty Images pursued legal action over image generation training data
- Meta, X, and LinkedIn have all taken action against data scrapers
Reddit's case could establish important precedents for all of these disputes, potentially reshaping how content creators and publishers are compensated for AI training data. To stay ahead of these changes, publishers should build comprehensive SEO strategies that adapt to evolving search technologies.
Impact on Publishers and the Search Ecosystem
AI Overviews Traffic Decline
Google's AI Overviews have fundamentally changed search traffic patterns. According to Similarweb's research on generative AI's impact on publishers:
| Metric | Before AI Overviews | After AI Overviews | Change |
|---|---|---|---|
| No-click searches | 56% | 69% | +13% |
| Monthly organic visits | 2.3 billion | 1.7 billion | -26% |
| Publisher referral traffic | Baseline | Significantly reduced | Declining |
Reddit's Unique Position
Reddit has emerged as the most-cited source in AI-generated content:
- Top source on Perplexity: Reddit conversations appear most frequently in AI search responses
- Top source on Google AI Overviews: Reddit content is heavily featured in Google's AI summaries
- Second-most cited on ChatGPT: Only behind Wikipedia in AI training citations
This prominence makes Reddit both a target for scrapers and a powerful advocate for content licensing. The shift in how users discover content through AI interfaces underscores the importance of building brand authority that transcends any single traffic source.
The Broader Trend
Publishers are seeing their content:
- Summarized by AI: Users get answers without visiting source pages
- Repackaged as AI output: Original attribution often minimized or lost
- Removed from click funnels: Traffic funnels through AI intermediaries instead
The economic model that sustained digital publishing for two decades is under unprecedented pressure, making it essential for content creators to adapt their content strategies accordingly.
Key insights for SEO professionals, content creators, and publishers navigating this evolving landscape
Data Licensing Is the New Standard
Content platforms are increasingly pursuing licensing deals with AI companies. Expect more platforms to follow Reddit's lead in monetizing their data assets.
Anti-Scraping Protections Matter
The case highlights the importance of robots.txt, rate limiting, and other technical measures. Platforms that don't protect their content have less legal standing.
Attribution Standards Are Evolving
As AI-generated summaries become more common, the industry may develop new standards for citing and linking to original sources.
SEO Strategy Must Adapt
With AI overviews changing click dynamics, traditional SEO metrics need reevaluation. Brand authority and direct traffic become more valuable.
Frequently Asked Questions
Why is Reddit suing Perplexity specifically?
Reddit alleges that Perplexity used intermediaries to scrape its content after receiving a cease-and-desist order in May 2024. Unlike OpenAI and Google, which negotiated licensing agreements, Perplexity allegedly continued accessing Reddit data through circumvention methods.
What is SerpApi's role in the lawsuit?
SerpApi is named as a defendant for allegedly providing Perplexity with access to scraped search data. The lawsuit claims SerpApi helped circumvent Google's anti-scraping protections by selling API access to search results that contained Reddit content.
How did Reddit prove the alleged scraping?
Reddit created a 'honeypot' test by posting content only visible to Google's crawlers. When that content appeared in Perplexity search results within hours, it proved Perplexity was pulling data from Google's index rather than accessing Reddit directly.
What is the potential impact on publishers?
If Reddit succeeds, it could establish a legal precedent requiring AI companies to pay for content access. This could fundamentally reshape the economics of content creation and force AI companies to negotiate with publishers rather than scraping freely.
How does this affect my SEO strategy?
AI overviews are changing how users interact with search results, with fewer clicks to publisher sites. Content professionals should focus on building brand authority, direct audience relationships, and understanding how AI systems cite and summarize content.
Will this stop AI companies from using web content?
Unlikely. More probable is a shift toward licensed data partnerships. Companies like OpenAI and Google have already negotiated deals. The lawsuit may accelerate this trend by establishing that unauthorized scraping carries significant legal risk.