HTML Tidy

The essential tool for clean, standards-compliant HTML that catches errors, formats markup, and improves accessibility.

What HTML Tidy Does

HTML Tidy serves as both a validator and a formatter for HTML documents. When you pass a file through Tidy, it performs several critical operations that transform problematic markup into clean, standards-compliant code.

Originally created by Dave Raggett at W3C and now maintained by the HTML Tidy Advocacy Group (HTACG), this tool reads HTML, XHTML, and XML files, detects and corrects common coding errors, and produces cleaned-up markup that follows web standards. With support for HTML5 and ongoing updates, Tidy remains a reliable companion for developers who care about markup quality.

In modern web development with Next.js and similar frameworks, clean HTML contributes directly to performance metrics like Core Web Vitals. Search engines parse clean markup more efficiently, accessibility tools work better with properly structured HTML, and maintainability improves dramatically when code follows consistent standards.

Key Capabilities

Error Correction

Automatically fixes missing end tags, incorrect nesting, unclosed elements, and malformed attributes.

Accessibility Analysis

Provides recommendations for improving accessibility, including table summaries and proper heading structure.

Document Cleanup

Strips surplus markup from Word exports and removes presentational tags in favor of CSS styles.

Pretty Printing

Formats output with consistent indentation, proper line wrapping, and readable structure.

Error Detection and Correction

HTML Tidy corrects a wide range of markup problems that developers commonly encounter.

Tag Structure Fixes

Missing or mismatched end tags are detected and automatically corrected. Tidy recognizes when an opening tag lacks its corresponding closing tag and inserts the missing end tag in the correct position.

End tags in the wrong order are also fixed. If your markup has improperly nested elements like <b>bold <i>italic</b> italic</i>, Tidy restructures them to <b>bold <i>italic</i></b> italic.

Heading emphasis issues that cause browser rendering problems are corrected. When <h1><i>italic heading</h1> is encountered, browsers may continue displaying subsequent content in heading font size. Tidy corrects this by properly nesting the elements as <h1><i>italic heading</i></h1>.

Mixed-up tags are recovered from, such as inline elements incorrectly placed inside block-level elements, and proper list structure is ensured by adding missing <ul>, <ol>, and <li> tags.

Attribute Handling

Tidy adds missing quotes around attribute values and reports when closing quotes are missing. This ensures all attribute values are properly delimited, which is required for valid HTML and prevents parsing issues in browsers.

For comprehensive examples of Tidy's error correction capabilities, refer to the W3C HTML Tidy documentation.

Before: Problematic HTML
1<h1>heading2<h2>subheading</h3>3<p>here is a para <b>bold <i>bold italic</b> bold?</i> normal?
After: Tidy Output
1<h1>heading</h1>2<h2>subheading</h2>3<p>here is a para <b>bold <i>bold italic</i></b> normal?</p>

Word Document Cleanup

One of Tidy's most powerful features is its ability to clean up HTML exported from Microsoft Word. When you save a Word document as a Web page, Word inserts substantial markup designed for round-tripping between Word and HTML. This export includes unnecessary styles, metadata, and structural elements that bloat file size and cause rendering inconsistencies.

Tidy's word-2000 configuration option strips this surplus markup, producing clean HTML suitable for web publishing. Typical Word exports can contain 60-80% more markup than the actual content requires. A 50KB Word export might reduce to 15-20KB after Tidy processing, significantly improving page load times and reducing bandwidth consumption.

This cleanup is particularly valuable when migrating content from legacy systems or when non-technical team members contribute content through familiar tools. By running Tidy as a preprocessing step in your content pipeline, you ensure imported content meets your standards before publication.

Installing and Running HTML Tidy

Installation

HTML Tidy is available across multiple platforms:

macOS via Homebrew:

brew install tidy

Linux (Ubuntu/Debian):

sudo apt install tidy

Windows: Download from HTACG project or install via Chocolatey

Node.js:

npm install tidy-html5

For developers working with Node.js, programmatic access through tidy-html5 allows integration into build workflows and tools.

Basic Command-Line Usage

The basic syntax for running HTML Tidy:

tidy [options] filename

Key options:

  • -m (modify): Updates the original file in place
  • -f filename: Redirects error messages to a file
  • -config file: Uses a specific configuration file
  • -q: Quiet mode, suppresses welcome message

Example workflow:

tidy -f errs.txt -m index.html

This processes index.html, updates it in place, and writes warnings or errors to errs.txt.

For complete documentation on all available options, see the HTML Tidy API Documentation.

Configuration Files

Configuration files provide the most convenient way to manage Tidy's behavior across multiple files or projects.

Sample Configuration

# Sample HTML Tidy configuration
indent: auto
indent-spaces: 2
wrap: 72
markup: yes
output-xml: no
char-encoding: utf8
clean: yes
show-warnings: yes
quote-ampersand: yes

Key options:

  • indent: Controls block-level content indentation (auto, yes, or no)
  • char-encoding: Character encoding (utf8 recommended for modern applications)
  • clean: Replaces presentational markup with CSS styles
  • wrap: Line length for wrapping (72 is the default)

Applying Configuration

tidy -config tidy.config index.html

Place configuration files in your project repository so all developers use consistent settings.

Custom Tags for Frameworks

Modern frameworks introduce custom elements that standard HTML doesn't recognize:

new-inline-tags: cfif, cfelse, math, mrow
new-blocklevel-tags: cfoutput, cfquery
new-empty-tags: cfelse

This configuration declares framework-specific tags that Tidy should recognize without reporting errors. Similar declarations work for React custom elements, Vue components, or any other frontend technologies you use in your projects.

Integration with Modern Development Workflows

Build System Integration

Add Tidy to your npm scripts:

{
 "scripts": {
 "tidy": "tidy -config .tidyrc -m 'src/**/*.html'",
 "tidy:check": "tidy -config .tidyrc -q 'src/**/*.html' || exit 1"
 }
}

The tidy:check script runs validation without modifying files, suitable for CI pipelines. If Tidy finds issues, it returns a non-zero exit code that prevents builds with invalid HTML.

Pre-commit Hooks

Integrate with Git's pre-commit hooks using Husky:

#!/bin/bash
git diff --cached --name-only | grep '\.html$' | while read file; do
 tidy -config .tidyrc -q "$file" || exit 1
done

This hook runs before commits, blocking commits with HTML issues.

Editor Integration

  • VS Code: "HTML Tidy" extension provides on-save formatting
  • Vim/Neovim: Configure as external formatter with plugins
  • JetBrains: Built-in formatting framework supports Tidy-style rules

For editors without native support, configure file watchers to invoke Tidy as an external command.

Incorporating HTML validation into your CI/CD pipeline ensures consistent code quality across all deployments.

HTML Tidy vs Modern Alternatives

The Evolving Landscape

Prettier and Biome have emerged as popular code formatters that handle multiple languages including HTML. These tools offer faster execution and broader language support than traditional Tidy.

However, HTML Tidy remains valuable for specific use cases:

FeatureHTML TidyPrettier/Biome
HTML-specific error correctionExtensiveLimited
Detailed error reportingComprehensiveBasic
Word document cleanupYesNo
Custom tag declarationYesNo
Execution speedModerateFast
Language supportHTML-focusedMulti-language

When to Use Each Tool

Use Prettier or Biome for everyday formatting across multiple languages in JavaScript-centric projects.

Use HTML Tidy for dedicated HTML validation, cleaning imported content, and comprehensive error reports.

Many teams use both tools together--Prettier for general formatting and Tidy for detailed HTML validation of critical files. This layered approach combines the best of both tools while respecting their respective strengths.

The choice between tools often depends on your technology stack. JavaScript-heavy projects may prefer Prettier's integrated approach, while projects with significant legacy HTML or strict validation requirements benefit from Tidy's specialized focus.

Best Practices for Clean HTML

Validation as Part of Development

Make HTML validation a regular part of your development process rather than an afterthought. Run Tidy on new HTML as you create it, not just when problems arise. This proactive approach catches issues early.

Configure Tidy to match your project's requirements and team standards. Document these settings in your project repository so all contributors understand the expected formatting rules.

Combining with Other Quality Tools

HTML Tidy works well alongside other development tools:

  • W3C Validator: Comprehensive HTML validation (different from Tidy)
  • axe/Lighthouse: Accessibility testing
  • ESLint/TypeScript: JavaScript validation

Each tool specializes in its domain, and the combination provides comprehensive coverage without overloading any single tool.

Handling Third-Party Content

Run Tidy on imported HTML before incorporating it into your project. The Word-2000 cleanup option is valuable for content from Microsoft Office applications.

Establish a preprocessing step in your content pipeline that runs Tidy on imported HTML. This ensures external content meets your standards before becoming part of your site. Our quality assurance services can help establish these workflows for your projects.

Frequently Asked Questions

Need Help with HTML Validation or Web Development?

Our team builds high-performance websites with clean, standards-compliant code that improves SEO and maintainability.