People's Names Break Websites

When Christopher Null can't create an account or Luke O'Sullivan has to fly under a different name, it's not a security feature--it's broken software. Learn why names break systems and how to handle them correctly.

The Problem: When Your Name Isn't Welcome

Imagine being told your legal name is "invalid" by a website. For millions of users worldwide with names containing apostrophes, hyphens, Unicode characters, or names matching SQL keywords, this is a daily frustration. From William Test who can't book hotels, to Christopher Null whose name databases interpret as NULL value, to Joan Fread rejected because her name matches a PHP function, the problem is pervasive and personal.

As modern web developers, we can do better. This guide covers why names break systems and how to handle them correctly in your Next.js applications. Proper web development practices ensure every user can use their real name online.

The Hidden Population

Names with special characters affect significant portions of the global population:

Apostrophes: O'Brien, O'Sullivan, D'Angelo (Irish, Italian, Portuguese cultures)
Hyphens: Jean-Pierre, María-José (French, Spanish naming conventions)
Unicode characters: Müller, Björk, García, 田中 (German, Icelandic, Spanish, Chinese)
Spaces in names: van der Waals, De la Cruz (Dutch, Spanish surnames)
Short names: Al, Jo, Wu (common in many cultures)
SQL keywords: Test, Null, Select, Drop (unfortunate but real names)

These aren't edge cases--they represent millions of real users who deserve to use their actual names online.

Did You Know?

Over 95% of web content uses UTF-8 encoding, yet many systems still restrict name fields to ASCII. This disconnect creates unnecessary barriers for users worldwide.

Names That Break Systems by the Numbers

150,000+

Characters in Unicode standard

128

Characters in ASCII (too limited)

95%

Web using UTF-8 encoding

Millions

Users with "problematic" names

Understanding Character Encoding

The root of many name-related issues lies in misunderstanding character encoding.

ASCII vs Unicode vs UTF-8

ASCII (American Standard Code for Information Interchange) defines only 128 characters: basic English letters (A-Z, a-z), digits (0-9), and common symbols. This was sufficient for early computing in America but fundamentally inadequate for global users.

Unicode is a comprehensive character set that includes over 150,000 characters covering virtually every writing system on Earth--from Latin scripts with accents, to Cyrillic, Greek, Arabic, Hebrew, Chinese, Japanese, Korean, emojis, and more.

UTF-8 (Unicode Transformation Format, 8-bit) is the variable-width encoding that maps Unicode code points to bytes. It's backward-compatible with ASCII (the first 128 characters are identical), supports all Unicode characters, and has become the standard encoding for the web. Ensuring your web applications use UTF-8 encoding is a fundamental web development best practice for serving global audiences.

The UTF-8 Solution

Using UTF-8 throughout your application stack solves most name-related problems:

// UTF-8 handles all of these correctly:
const users = [
 { name: "José García" }, // Spanish accents
 { name: "Björk Guðmundsdóttir" }, // Icelandic characters
 { name: "田中太郎" }, // Japanese kanji
 { name: "Nguyễn Văn Minh" }, // Vietnamese diacritics
 { name: "François" }, // French cedilla
 { name: "Müller" }, // German umlaut
 { name: "Ødegaard" }, // Norwegian letter
];

As noted by the W3C Internationalization Quick Tips, UTF-8 is the standard encoding for web content globally and should be used for all content, databases, and software.

ASCII Limitations in Practice

Legacy systems designed in the 1970s-1990s often still use ASCII for performance or compatibility reasons. Some systems strip non-ASCII characters without warning, transforming "Müller" into "Mller" or rejecting names entirely. This approach is fundamentally broken for any application serving a global audience, particularly for businesses looking to expand internationally.

Common Failure Patterns

How websites break when handling names

Overly Strict Validation

Regex patterns rejecting apostrophes, hyphens, spaces, and Unicode characters. Short names like "Al" getting rejected for being too short.

Security Filter Overreach

Content moderation catching innocent names. "Christian" or "Islam" flagged because they contain religious terms. Names with "ass" or "cock" blocked regardless of context.

Database Issues

SQL injection fears leading to ASCII-only restrictions. Names like "O'Brien" breaking queries when special characters aren't properly escaped.

Assumption-Based Design

Assuming first name/last name structure. Requiring family names when some cultures don't use them. Sorting by family name when cultures sort by given name.

Code Examples: What NOT to Do

// BAD: Rejects valid names with ASCII-only validation
const validateName = (name) => {
 return /^[a-zA-Z]+$/.test(name); // Rejects spaces, accents, Unicode
};

// BAD: Rejects innocent names matching keywords
const badFilter = (name) => {
 return !/^(test|null|select|drop|delete)/i.test(name); // Rejects real people!
};

// BAD: String concatenation causes both security issues AND rejects names
const query = `SELECT * FROM users WHERE name = '${name}'`;
// Fails for O'Brien AND allows SQL injection

// GOOD: Accept all valid input
const validateName = (name) => {
 return typeof name === 'string' && name.trim().length > 0;
};

// GOOD: Prepared statements handle all names safely
const query = 'SELECT * FROM users WHERE name = $1';
await db.query(query, [name]); // O'Brien works, injection prevented

Real Names That Break Systems

From CSS-Tricks' comprehensive coverage of this issue, these are actual cases documented:

Christopher Null - Databases interpret as SQL NULL value
William Test - Flagged as test data by booking systems
Joan Fread - Name matches PHP function
Luke O'Sullivan - Had to fly under a different name
O'Donnell, O'Brien, D'Aoust - Apostrophes rejected
Knud (with ø) - Converts to ? or rejected

These aren't hypothetical problems--they're real experiences that cause genuine frustration and exclusion. Building robust web applications that handle all names correctly is essential for user trust and accessibility.

Cultural Naming Conventions That Break Assumptions

Western naming conventions (given name first, family name last) are not universal. Designing forms based on these assumptions excludes significant portions of the global population. Internationalization in web development requires understanding these cultural differences.

Different Name Orders

Culture	Example	Order	Notes
Chinese	毛泽东 (Mao Ze Dong)	Family-Given	Mao is family name
Japanese	田中太郎 (Tanaka Taro)	Family-Given	Tanaka is family name
Korean	김철수 (Kim Cheolsu)	Family-Given	Kim is family name
Hungarian	Szabó István	Family-Given	Szabó is family name
Vietnamese	Nguyễn Văn Minh	Family-Middle-Given	Nguyễn is family name
Icelandic	Björk Guðmundsdóttir	Given-Patronymic	No family name, patronymic indicates father's name

As documented by the W3C's comprehensive guide to personal names around the world, systems must accommodate diverse naming practices from around the globe.

Names Without Family Names

Icelandic naming uses patronymics (Guðmundsdóttir = daughter of Guðmundur) rather than family names. Many Malay and Indonesian names consist only of a given name. Forcing these users to enter a family name results in garbage data like "." or "Mr."

Multiple Family Names

Spanish/Latino names include two family names (paternal and maternal):

María José Carreño Quiñones (paternal: Carreño, maternal: Quiñones)

Portuguese/Brazilian names can include three or more family names from ancestors, often with connecting words like "de" or "e".

Special Characters by Culture

Apostrophes: O'Brien (Irish), D'Angelo (Italian)
Hyphens: Jean-Pierre (French), Müller-Schmidt (German compound)
Spaces: van der Waals (Dutch), De la Cruz (Spanish)
Periods: Jr., Sr. (American suffixes)
Unicode diacritics: All European languages, plus global scripts

Implications for Form Design

Rather than asking for "first name" and "last name," use culturally neutral labels:

"Given name(s)" and "Family name"
Or better: a single "Full name" field when possible
Allow users to specify which part is their family name for sorting

Next.js Implementation: Proper Name Handling

1// next.config.js - Ensure proper encoding2module.exports = {3 async headers() {4 return [5 {6 source: '/:path*',7 headers: [8 { key: 'Content-Type', value: 'text/html; charset=utf-8' },9 ],10 },11 ];12 },13};14 15// utils/validation.ts - Accept all valid names16export const validateName = (name: string): boolean => {17 if (typeof name !== 'string' || name.trim().length === 0) {18 return false;19 }20 21 const trimmed = name.trim();22 // Reasonable length check (not too restrictive)23 if (trimmed.length < 1 || trimmed.length > 200) {24 return false;25 }26 27 // Accept ALL characters - names can contain anything28 return true;29};30 31// Database schema - PostgreSQL with UTF-832/*33CREATE TABLE users (34 id SERIAL PRIMARY KEY,35 full_name VARCHAR(200) NOT NULL,36 given_name VARCHAR(100),37 family_name VARCHAR(100),38 display_name VARCHAR(100),39 created_at TIMESTAMP DEFAULT NOW()40);41 42-- Use ICU collation for international sorting43ALTER TABLE users ALTER COLUMN full_name 44 SET DATA TYPE VARCHAR(200) 45 COLLATE "en-x-icu";46*/47 48// API route with proper encoding49export async function GET() {50 const users = await getUsers();51 return Response.json(users, {52 headers: {53 'Content-Type': 'application/json; charset=utf-8',54 },55 });56}

Security Without Breaking Names

A common misconception is that names must be sanitized or restricted to prevent security issues. This is wrong--proper security practices don't require rejecting valid names.

The Injection Attack Misconception

The fear that names like "Robert'); DROP TABLE Users;--" could cause problems leads some developers to reject special characters. But the solution isn't rejecting input--it's using parameterized queries:

// WRONG: Rejecting names (doesn't actually prevent injection)
const sanitizeName = (name) => name.replace(/['";]/g, '');

// CORRECT: Using prepared statements (actually prevents injection)
const getUser = async (name) => {
 return await db.query(
 'SELECT * FROM users WHERE name = $1',
 [name] // Name can contain anything!
 );
};

As explained in Hackaday's technical analysis, security concerns about injection are often misused to justify ASCII-only restrictions when prepared statements handle all names safely.

Content Security for Display

For public-facing displays of user-generated content (comments, forum posts, profiles), appropriate escaping and content security policies handle any edge cases without rejecting valid names:

// Escaping for HTML display
const escapeHtml = (str) => {
 return str
 .replace(/&/g, '&amp;')
 .replace(/</g, '&lt;')
 .replace(/>/g, '&gt;')
 .replace(/"/g, '&quot;')
 .replace(/'/g, '&#039;');
};

// Names display correctly and safely
displayName.textContent = escapeHtml(user.fullName);

Input Validation vs Output Encoding

Input validation: Check that the input is a valid format (string, reasonable length)
Output encoding: Escape content appropriately for the context (HTML, SQL, etc.)

Validating that a name is a non-empty string is appropriate. Rejecting names because they contain apostrophes is not. Following secure web development practices ensures your applications are both secure and inclusive.

Security Filters That Block Names Are Broken

If your security filter blocks someone's legal name like "Christian", "Islam", or "Niger", it's not protecting anyone--it's just broken software. Names are identity, not content. Use proper input handling and output encoding instead of crude keyword filtering.

Frequently Asked Questions

Should I validate names with regex?

Avoid regex validation that rejects specific characters. Check only that the input is a non-empty string of reasonable length. Names can contain any character.

What about names with SQL keywords like 'Test' or 'Null'?

These are real names that happen to match keywords. Use prepared statements--they handle all names safely without any character restrictions.

How long should name fields be?

Allow at least 100 characters, preferably 200. UTF-8 can use up to 4 bytes per character, so a 50-character Chinese name may need 200 bytes.

Should I split names into first/last?

Only if you need to address users by specific components. A single "Full name" field works for most cases and avoids cultural assumptions.

How do I sort international names?

Use ICU collations (e.g., PostgreSQL's "en-x-icu") for proper international sorting. Different cultures sort names differently--Thai and Icelandic sort by given name, not family name.

Do I need to support right-to-left scripts?

Yes, if your users include Arabic, Hebrew, Persian, or Urdu speakers. Add direction detection and appropriate CSS (direction: rtl, unicode-bidi: embed).

Use UTF-8 Everywhere

Database, API, frontend--UTF-8 encoding throughout the stack

Accept All Characters

No ASCII-only restrictions. Names can contain any Unicode character

Avoid Character Restrictions

Don't reject apostrophes, hyphens, spaces, or numbers in names

No Keyword Blocking

Names like 'Test' or 'Null' are valid. Use prepared statements instead

Culturally Neutral Labels

Avoid 'first/last' assumptions. Use 'given name' and 'family name' or a single 'full name' field

Reasonable Length Limits

Allow 100-200 characters minimum. UTF-8 bytes ≠ character count

ICU Collations

Use international collations for proper sorting across cultures

Test with Real Names

Include diverse name examples: O'Brien, Müller, 田中, Nguyễn

Build Inclusive Web Applications

Every user deserves to use their real name. Ensure your web applications properly handle international names with correct encoding, validation, and database practices.