What HTML Tidy Does
HTML Tidy serves as both a validator and a formatter for HTML documents. When you pass a file through Tidy, it performs several critical operations that transform problematic markup into clean, standards-compliant code.
Originally created by Dave Raggett at W3C and now maintained by the HTML Tidy Advocacy Group (HTACG), this tool reads HTML, XHTML, and XML files, detects and corrects common coding errors, and produces cleaned-up markup that follows web standards. With support for HTML5 and ongoing updates, Tidy remains a reliable companion for developers who care about markup quality.
In modern web development with Next.js and similar frameworks, clean HTML contributes directly to performance metrics like Core Web Vitals. Search engines parse clean markup more efficiently, accessibility tools work better with properly structured HTML, and maintainability improves dramatically when code follows consistent standards.
Error Correction
Automatically fixes missing end tags, incorrect nesting, unclosed elements, and malformed attributes.
Accessibility Analysis
Provides recommendations for improving accessibility, including table summaries and proper heading structure.
Document Cleanup
Strips surplus markup from Word exports and removes presentational tags in favor of CSS styles.
Pretty Printing
Formats output with consistent indentation, proper line wrapping, and readable structure.
Error Detection and Correction
HTML Tidy corrects a wide range of markup problems that developers commonly encounter.
Tag Structure Fixes
Missing or mismatched end tags are detected and automatically corrected. Tidy recognizes when an opening tag lacks its corresponding closing tag and inserts the missing end tag in the correct position.
End tags in the wrong order are also fixed. If your markup has improperly nested elements like <b>bold <i>italic</b> italic</i>, Tidy restructures them to <b>bold <i>italic</i></b> italic.
Heading emphasis issues that cause browser rendering problems are corrected. When <h1><i>italic heading</h1> is encountered, browsers may continue displaying subsequent content in heading font size. Tidy corrects this by properly nesting the elements as <h1><i>italic heading</i></h1>.
Mixed-up tags are recovered from, such as inline elements incorrectly placed inside block-level elements, and proper list structure is ensured by adding missing <ul>, <ol>, and <li> tags.
Attribute Handling
Tidy adds missing quotes around attribute values and reports when closing quotes are missing. This ensures all attribute values are properly delimited, which is required for valid HTML and prevents parsing issues in browsers.
For comprehensive examples of Tidy's error correction capabilities, refer to the W3C HTML Tidy documentation.
1<h1>heading2<h2>subheading</h3>3<p>here is a para <b>bold <i>bold italic</b> bold?</i> normal?1<h1>heading</h1>2<h2>subheading</h2>3<p>here is a para <b>bold <i>bold italic</i></b> normal?</p>Word Document Cleanup
One of Tidy's most powerful features is its ability to clean up HTML exported from Microsoft Word. When you save a Word document as a Web page, Word inserts substantial markup designed for round-tripping between Word and HTML. This export includes unnecessary styles, metadata, and structural elements that bloat file size and cause rendering inconsistencies.
Tidy's word-2000 configuration option strips this surplus markup, producing clean HTML suitable for web publishing. Typical Word exports can contain 60-80% more markup than the actual content requires. A 50KB Word export might reduce to 15-20KB after Tidy processing, significantly improving page load times and reducing bandwidth consumption.
This cleanup is particularly valuable when migrating content from legacy systems or when non-technical team members contribute content through familiar tools. By running Tidy as a preprocessing step in your content pipeline, you ensure imported content meets your standards before publication.
Installing and Running HTML Tidy
Installation
HTML Tidy is available across multiple platforms:
macOS via Homebrew:
brew install tidy
Linux (Ubuntu/Debian):
sudo apt install tidy
Windows: Download from HTACG project or install via Chocolatey
Node.js:
npm install tidy-html5
For developers working with Node.js, programmatic access through tidy-html5 allows integration into build workflows and tools.
Basic Command-Line Usage
The basic syntax for running HTML Tidy:
tidy [options] filename
Key options:
-m(modify): Updates the original file in place-f filename: Redirects error messages to a file-config file: Uses a specific configuration file-q: Quiet mode, suppresses welcome message
Example workflow:
tidy -f errs.txt -m index.html
This processes index.html, updates it in place, and writes warnings or errors to errs.txt.
For complete documentation on all available options, see the HTML Tidy API Documentation.
Configuration Files
Configuration files provide the most convenient way to manage Tidy's behavior across multiple files or projects.
Sample Configuration
# Sample HTML Tidy configuration
indent: auto
indent-spaces: 2
wrap: 72
markup: yes
output-xml: no
char-encoding: utf8
clean: yes
show-warnings: yes
quote-ampersand: yes
Key options:
indent: Controls block-level content indentation (auto,yes, orno)char-encoding: Character encoding (utf8recommended for modern applications)clean: Replaces presentational markup with CSS styleswrap: Line length for wrapping (72 is the default)
Applying Configuration
tidy -config tidy.config index.html
Place configuration files in your project repository so all developers use consistent settings.
Custom Tags for Frameworks
Modern frameworks introduce custom elements that standard HTML doesn't recognize:
new-inline-tags: cfif, cfelse, math, mrow
new-blocklevel-tags: cfoutput, cfquery
new-empty-tags: cfelse
This configuration declares framework-specific tags that Tidy should recognize without reporting errors. Similar declarations work for React custom elements, Vue components, or any other frontend technologies you use in your projects.
Integration with Modern Development Workflows
Build System Integration
Add Tidy to your npm scripts:
{
"scripts": {
"tidy": "tidy -config .tidyrc -m 'src/**/*.html'",
"tidy:check": "tidy -config .tidyrc -q 'src/**/*.html' || exit 1"
}
}
The tidy:check script runs validation without modifying files, suitable for CI pipelines. If Tidy finds issues, it returns a non-zero exit code that prevents builds with invalid HTML.
Pre-commit Hooks
Integrate with Git's pre-commit hooks using Husky:
#!/bin/bash
git diff --cached --name-only | grep '\.html$' | while read file; do
tidy -config .tidyrc -q "$file" || exit 1
done
This hook runs before commits, blocking commits with HTML issues.
Editor Integration
- VS Code: "HTML Tidy" extension provides on-save formatting
- Vim/Neovim: Configure as external formatter with plugins
- JetBrains: Built-in formatting framework supports Tidy-style rules
For editors without native support, configure file watchers to invoke Tidy as an external command.
Incorporating HTML validation into your CI/CD pipeline ensures consistent code quality across all deployments.
HTML Tidy vs Modern Alternatives
The Evolving Landscape
Prettier and Biome have emerged as popular code formatters that handle multiple languages including HTML. These tools offer faster execution and broader language support than traditional Tidy.
However, HTML Tidy remains valuable for specific use cases:
| Feature | HTML Tidy | Prettier/Biome |
|---|---|---|
| HTML-specific error correction | Extensive | Limited |
| Detailed error reporting | Comprehensive | Basic |
| Word document cleanup | Yes | No |
| Custom tag declaration | Yes | No |
| Execution speed | Moderate | Fast |
| Language support | HTML-focused | Multi-language |
When to Use Each Tool
Use Prettier or Biome for everyday formatting across multiple languages in JavaScript-centric projects.
Use HTML Tidy for dedicated HTML validation, cleaning imported content, and comprehensive error reports.
Many teams use both tools together--Prettier for general formatting and Tidy for detailed HTML validation of critical files. This layered approach combines the best of both tools while respecting their respective strengths.
The choice between tools often depends on your technology stack. JavaScript-heavy projects may prefer Prettier's integrated approach, while projects with significant legacy HTML or strict validation requirements benefit from Tidy's specialized focus.
Best Practices for Clean HTML
Validation as Part of Development
Make HTML validation a regular part of your development process rather than an afterthought. Run Tidy on new HTML as you create it, not just when problems arise. This proactive approach catches issues early.
Configure Tidy to match your project's requirements and team standards. Document these settings in your project repository so all contributors understand the expected formatting rules.
Combining with Other Quality Tools
HTML Tidy works well alongside other development tools:
- W3C Validator: Comprehensive HTML validation (different from Tidy)
- axe/Lighthouse: Accessibility testing
- ESLint/TypeScript: JavaScript validation
Each tool specializes in its domain, and the combination provides comprehensive coverage without overloading any single tool.
Handling Third-Party Content
Run Tidy on imported HTML before incorporating it into your project. The Word-2000 cleanup option is valuable for content from Microsoft Office applications.
Establish a preprocessing step in your content pipeline that runs Tidy on imported HTML. This ensures external content meets your standards before becoming part of your site. Our quality assurance services can help establish these workflows for your projects.