PHP mb_detect_encoding() – Supporting Multiple Encodings

Table of Contents: Syntax for Detecting Multiple Encodings Example 1: Detect Encoding from a List Example 2: Using Strict Mode Example 3: Detect and Convert to UTF-8 Example 4: Detect Encoding from a File Example 5: Detect Using All Available Encodings Common Encoding Lists Summary

PHP mb_detect_encoding with Multiple Character Encodings

The mb_detect_encoding() function in PHP can detect character encoding from a list of multiple encodings. This is essential when dealing with multilingual text, different file encodings, and legacy data formats.

Pro Tip: Always list encodings in order of likelihood (e.g., UTF-8 first for modern web applications). This improves detection accuracy and performance.

Syntax for Detecting Multiple Encodings

mb_detect_encoding(string $string, array|string $encodings = null, bool $strict = false): string|false

$string – The text whose encoding needs to be detected.
$encodings (optional) – A list of encodings to check. Can be array or comma-separated string.
$strict (optional) – If true, performs strict encoding detection (slower but more accurate).

Returns: The detected encoding name or false if detection fails.

Example 1: Detect Encoding from a List

Basic Multiple Encoding Detection

<?php
$text = "Héllo Wörld";  // Text with special characters

$encodings = ["UTF-8", "ISO-8859-1", "Windows-1252"];
$detected_encoding = mb_detect_encoding($text, $encodings);

echo $detected_encoding ? "Encoding detected: $detected_encoding" : "Encoding not detected";
?>

Output (if UTF-8 is detected): Encoding detected: UTF-8

Example 2: Using mb_detect_encoding() with Strict Mode

Precise Detection with Strict Mode

<?php
$text = file_get_contents("sample.txt");  // Read file content

$encodings = ["UTF-8", "ISO-8859-1", "Windows-1252"];
$detected_encoding = mb_detect_encoding($text, $encodings, true); // Strict mode ON

echo $detected_encoding ? "Encoding: $detected_encoding" : "Encoding not detected";
?>

Strict mode: Reduces false positives but may return false if unsure. Use when accuracy is more important than speed.

Example 3: Detect Encoding and Convert to UTF-8

Convert Non-UTF-8 Text to UTF-8

<?php
$text = "Some text with unknown encoding";

$encodings = ["UTF-8", "ISO-8859-1", "Windows-1252"];
$detected = mb_detect_encoding($text, $encodings);

if ($detected && $detected !== "UTF-8") {
    $text = mb_convert_encoding($text, "UTF-8", $detected);
    echo "Converted from $detected to UTF-8.";
} else {
    echo "Text is already UTF-8 or encoding detection failed.";
}

echo "Converted text: " . $text;
?>

Key: Ensures text is UTF-8 encoded for web compatibility and consistent processing.

Example 4: Detect Encoding from a File

Detect File Encoding

<?php
$file_content = file_get_contents("data.txt");

$encodings = ["UTF-8", "ISO-8859-1", "Windows-1251", "ASCII"];
$encoding = mb_detect_encoding($file_content, $encodings);

echo "File encoding: " . ($encoding ?: "Unknown");
?>

Use case: Helpful for processing user-uploaded files, CSV imports, or legacy data files where encoding is unknown.

File Size Warning: For large files, consider reading only the first few KB for detection (file_get_contents("data.txt", false, null, 0, 8192)) to avoid memory issues.

Example 5: Detect Encoding Using All Available Encodings

Detect from All Supported Encodings

<?php
$text = "Some unknown text";

$encoding = mb_detect_encoding($text, mb_list_encodings());

echo "Detected encoding: " . ($encoding ?: "Unknown");
?>

Performance Note: This scans through all available encodings (~50+). Use only when you have no idea about possible encodings, as it's slower than a targeted list.

Common Encoding Lists for mb_detect_encoding()

Recommended encoding lists based on language/region:

Western European Languages (English, Spanish, French, German, etc.)

$encodings = ["UTF-8", "ISO-8859-1", "Windows-1252", "ASCII"];

Eastern European Languages (Russian, Bulgarian, etc.)

$encodings = ["UTF-8", "Windows-1251", "KOI8-R", "ISO-8859-5"];

Japanese Text

$encodings = ["UTF-8", "Shift-JIS", "EUC-JP", "ISO-2022-JP"];

Simplified Chinese

$encodings = ["UTF-8", "GB2312", "GBK", "GB18030"];

Universal Multilingual Detection

$encodings = ["UTF-8", "ISO-8859-1", "Windows-1252", "Windows-1251", 
               "Shift-JIS", "EUC-JP", "GB2312", "BIG5", "ASCII"];

Summary

Feature	Description
Detects multiple encodings	Pass an array of possible encodings in order of likelihood
Supports strict mode	Use `true` for precise detection (fewer false positives)
Works with file input	Detect encoding from file content before processing
Helps convert text to UTF-8	Use with `mb_convert_encoding()` for standardization
Can check all available encodings	Use `mb_list_encodings()` as fallback
Best for multilingual apps	Essential for handling user input, files, APIs with mixed encodings

Final Recommendation: For most web applications, start with ["UTF-8", "ISO-8859-1", "Windows-1252"] for Western languages. Always convert detected text to UTF-8 early in your processing pipeline for consistency.

Now you know how to effectively detect and handle multiple character encodings in PHP!

PHP mb_detect_encoding() – Supporting Multiple Encodings

Syntax for Detecting Multiple Encodings

Example 1: Detect Encoding from a List

Basic Multiple Encoding Detection

Example 2: Using mb_detect_encoding() with Strict Mode

Precise Detection with Strict Mode

Example 3: Detect Encoding and Convert to UTF-8

Convert Non-UTF-8 Text to UTF-8

Example 4: Detect Encoding from a File

Detect File Encoding

Example 5: Detect Encoding Using All Available Encodings

Detect from All Supported Encodings

Common Encoding Lists for mb_detect_encoding()

Western European Languages (English, Spanish, French, German, etc.)

Eastern European Languages (Russian, Bulgarian, etc.)

Japanese Text

Simplified Chinese

Universal Multilingual Detection

Summary

High-Availability Cloud VDS

Hosting

Cloud Solution

For You

VDS

Website Builders

Our services

Client Area

Select Currency

PHP mb_detect_encoding() – Supporting Multiple Encodings

Syntax for Detecting Multiple Encodings

Example 1: Detect Encoding from a List

Basic Multiple Encoding Detection

Example 2: Using mb_detect_encoding() with Strict Mode

Precise Detection with Strict Mode

Example 3: Detect Encoding and Convert to UTF-8

Convert Non-UTF-8 Text to UTF-8

Example 4: Detect Encoding from a File

Detect File Encoding

Example 5: Detect Encoding Using All Available Encodings

Detect from All Supported Encodings

Common Encoding Lists for mb_detect_encoding()

Western European Languages (English, Spanish, French, German, etc.)

Eastern European Languages (Russian, Bulgarian, etc.)

Japanese Text

Simplified Chinese

Universal Multilingual Detection

Summary

High-Availability Cloud VDS

Hosting

Cloud Solution

For You

VDS

Website Builders

Our services

Client Area

Выберите валюту