What Are Cues?
Cues are individual time-aligned text segments within a text track that display at specific moments during media playback. Each cue represents a discrete piece of synchronized text--from simple subtitle lines to chapter markers and interactive annotations. In the WebVTT (Web Video Text Tracks) ecosystem, cues form the fundamental building blocks of timed text content, enabling developers to create accessible, engaging video experiences that reach diverse audiences.
As defined by the WebVTT specification on MDN, cues enable a range of use cases: subtitles for language translation, captions that describe sound effects for deaf viewers, chapters for navigation, audio descriptions for visual content, karaoke lyrics, and interactive annotations that respond to user interaction. Understanding how to work with cues programmatically unlocks powerful capabilities for building compliant, user-friendly media applications.
For developers building modern web applications, mastering the Text Track Cue API is essential for creating media experiences that meet accessibility standards while delivering engaging content to global audiences.
The VTTCue Interface
The VTTCue interface provides the programmatic representation of a single text track cue. This interface inherits from the TextTrackCue base interface and adds WebVTT-specific functionality for positioning and display control. The constructor accepts three parameters: startTime (when the cue appears in seconds), endTime (when it disappears), and the text content to display.
// Create a basic cue with timing and text
const cue = new VTTCue(0, 5, "Welcome to our video!");
// Access core properties
console.log(cue.startTime); // 0
console.log(cue.endTime); // 5
console.log(cue.text); // "Welcome to our video!"
Beyond these core properties, VTTCue provides extensive display controls including vertical (for right-to-left text), line-based positioning with snapToLines, percentage-based position and size, and alignment settings. These properties work together to enable precise control over cue placement across different video dimensions and aspect ratios.
Creating Cues Programmatically
There are two primary approaches for adding cues to media elements. The first uses the HTML track element with a src attribute pointing to a .vtt file containing cue data in the WebVTT format. This declarative approach works well for static caption files that don't change at runtime.
<video controls>
<track kind="captions" label="English" src="captions.vtt" srclang="en" default>
</video>
The second approach uses JavaScript's addTextTrack() method for dynamic cue creation. This method accepts three parameters: kind (captions, subtitles, descriptions, chapters, or metadata), label, and language. Once created, you can add individual cues using the addCue() method or remove them with removeCue().
const video = document.querySelector('video');
// Create a text track dynamically
const track = video.addTextTrack('captions', 'English Captions', 'en');
track.mode = 'showing'; // Make cues visible
// Create and add cues programmatically
track.addCue(new VTTCue(0, 3, 'Welcome to our comprehensive guide.'));
track.addCue(new VTTCue(3, 6, 'Today we will explore the Text Track Cue API.'));
track.addCue(new VTTCue(6, 10, 'Cues enable synchronized text overlays for video.'));
// Remove a cue when no longer needed
const cueToRemove = track.cues[0];
track.removeCue(cueToRemove);
Track Mode Control
Control visibility with disabled, hidden, or showing modes. Tracks start disabled by default and must be activated for cue display.
Active Cues Tracking
Query activeCues property to get currently visible cues based on video playback position. Enables reactive UI updates.
Cue Management
Add, remove, and manipulate cues dynamically with addCue() and removeCue() methods on the TextTrack interface.
Event Handling
Listen for cuechange events to respond when active cues change, enabling synchronized content and analytics.
Cue Positioning and Display Settings
The VTTCue interface provides comprehensive control over how cues appear on screen, enabling precise positioning that adapts to different screen sizes and viewing contexts. Understanding how these properties interact is essential for creating professional caption displays that enhance rather than distract from the viewing experience.
The vertical property controls text orientation for languages that read right-to-left or left-to-right vertically. Acceptable values are an empty string (horizontal), "lr" for left-to-right vertical text, or "rl" for right-to-left. This setting affects how line and position values are interpreted, ensuring correct display for international audiences.
The line-based positioning system uses two key properties: snapToLines and line. When snapToLines is true (the default), the line property represents an integer line number that cues snap to for reliable positioning. When false, line can be a percentage value representing position from the video's top or bottom edge. The lineAlign property then controls horizontal alignment within the cue box--start, center, or end.
const cue = new VTTCue(10, 15, "Positioned text");
cue.vertical = ''; // Horizontal text
cue.snapToLines = true;
cue.line = 0; // Top line
cue.lineAlign = 'center';
cue.position = 50; // Center horizontally
cue.positionAlign = 'center';
cue.size = 80; // 80% of video width
cue.align = 'center';
Styling Cues with CSS
The ::cue pseudo-element provides browser-native styling for text track cues, allowing you to customize appearance while maintaining accessibility standards. This pseudo-element accepts a limited but useful set of CSS properties including color, background-color, text-shadow, and font properties. You can target specific cues by ID using ::cue(#cue-id) or by class using ::cue(.classname) for more granular control.
/* Base cue styling */
::cue {
color: #ffffff;
background-color: rgba(0, 0, 0, 0.7);
font-family: Arial, sans-serif;
font-size: 1.1em;
}
/* Highlight specific cues with IDs */
::cue(#chapter-marker) {
color: #ffd700;
font-weight: bold;
background-color: rgba(0, 0, 0, 0.9);
}
/* Style cues with custom classes */
::cue(.speaker-different) {
color: #90ee90;
}
/* High contrast mode */
@media (prefers-contrast: more) {
::cue {
color: #ffff00;
background-color: #000000;
text-shadow: none;
}
}
When styling cues, remember that not all CSS properties work within the cue's constrained rendering context. Properties affecting layout like display, position, and margin have limited or no effect. Focus on text appearance properties that enhance readability while respecting the cue's natural placement within the video overlay.
Implementing accessible video with proper cue styling also contributes to your overall SEO strategy, as search engines increasingly favor websites that provide accessible, well-structured media content.
1// Create a video element reference2const video = document.querySelector('video');3 4// Add a new text track for captions5const track = video.addTextTrack('captions', 'English Captions', 'en');6track.mode = 'showing';7 8// Create and add cues programmatically9track.addCue(new VTTCue(0, 3, 'Welcome to our comprehensive guide.'));10track.addCue(new VTTCue(3, 6, 'Today we will explore the Text Track Cue API.'));11track.addCue(new VTTCue(6, 10, 'Cues enable synchronized text overlays for video.'));12 13// Listen for cue changes to sync external content14track.oncuechange = () => {15 const activeCues = track.activeCues;16 if (activeCues && activeCues.length > 0) {17 const currentCue = activeCues[0];18 console.log('Current cue:', currentCue.text);19 }20};Frequently Asked Questions
What is the difference between captions and subtitles?
Captions include transcriptions of dialogue plus descriptions of sound effects and music, intended for viewers who cannot hear the audio. Subtitles provide translations of dialogue for viewers who can hear but don't understand the language.
How do I create a VTT file?
VTT files use a simple text format starting with 'WEBVTT' on the first line, followed by cue entries separated by blank lines. Each cue has a timing line (00:00:00.000 --> 00:00:05.000) followed by cue text. See the WebVTT specification for full syntax details.
Can I style individual cues differently?
Yes, you can use VTT classes within cue text (marked with <c.classname>text</c>) and target them with CSS using ::cue(c.classname). You can also assign IDs to cues and target them with ::cue(#cue-id).
How do I handle cue events in React or other frameworks?
Add event listeners in a useEffect hook (React) or component lifecycle method. Store the track reference and properly clean up listeners when the component unmounts to prevent memory leaks.
Conclusion
The Text Track Cue API provides a standardized, browser-native approach to adding synchronized text to media content. From simple subtitle display to sophisticated interactive annotations, understanding VTTCue properties, TextTrack management, and CSS styling enables developers to create robust captioning solutions that enhance both accessibility and user engagement across all modern browsers.
Cues serve as the fundamental building blocks of timed text in web video, enabling diverse use cases from accessibility-focused captions to interactive chapter navigation. The VTTCue interface offers comprehensive control over cue timing, positioning, and appearance, while the TextTrack API manages cue collections and handles mode transitions between hidden, disabled, and showing states.
For production implementations, always prioritize accessibility by including captions for spoken content, descriptions for visual information, and providing user controls for visibility preferences. Performance optimization through lazy loading, proper cue cleanup, and debounced event handlers ensures smooth playback even with large caption files.
By mastering these APIs, you can build media experiences that reach wider audiences, comply with accessibility regulations like WCAG and ADA, and deliver engaging content that adapts to diverse viewing contexts.
As video continues to play a larger role in digital experiences, combining the Text Track Cue API with AI-powered automation opens new possibilities for real-time caption generation, translation, and personalized content delivery at scale.