The mb_detect_encoding() function in PHP can detect character encoding from a list of multiple encodings. This is essential when dealing with multilingual text, different file encodings, and legacy data formats.
mb_detect_encoding(string $string, array|string $encodings = null, bool $strict = false): string|false
true, performs strict encoding detection (slower but more accurate).Returns: The detected encoding name or false if detection fails.
<?php
$text = "Héllo Wörld"; // Text with special characters
$encodings = ["UTF-8", "ISO-8859-1", "Windows-1252"];
$detected_encoding = mb_detect_encoding($text, $encodings);
echo $detected_encoding ? "Encoding detected: $detected_encoding" : "Encoding not detected";
?>
Output (if UTF-8 is detected): Encoding detected: UTF-8
<?php
$text = file_get_contents("sample.txt"); // Read file content
$encodings = ["UTF-8", "ISO-8859-1", "Windows-1252"];
$detected_encoding = mb_detect_encoding($text, $encodings, true); // Strict mode ON
echo $detected_encoding ? "Encoding: $detected_encoding" : "Encoding not detected";
?>
Strict mode: Reduces false positives but may return false if unsure. Use when accuracy is more important than speed.
<?php
$text = "Some text with unknown encoding";
$encodings = ["UTF-8", "ISO-8859-1", "Windows-1252"];
$detected = mb_detect_encoding($text, $encodings);
if ($detected && $detected !== "UTF-8") {
$text = mb_convert_encoding($text, "UTF-8", $detected);
echo "Converted from $detected to UTF-8.";
} else {
echo "Text is already UTF-8 or encoding detection failed.";
}
echo "Converted text: " . $text;
?>
Key: Ensures text is UTF-8 encoded for web compatibility and consistent processing.
<?php
$file_content = file_get_contents("data.txt");
$encodings = ["UTF-8", "ISO-8859-1", "Windows-1251", "ASCII"];
$encoding = mb_detect_encoding($file_content, $encodings);
echo "File encoding: " . ($encoding ?: "Unknown");
?>
Use case: Helpful for processing user-uploaded files, CSV imports, or legacy data files where encoding is unknown.
file_get_contents("data.txt", false, null, 0, 8192)) to avoid memory issues.
<?php
$text = "Some unknown text";
$encoding = mb_detect_encoding($text, mb_list_encodings());
echo "Detected encoding: " . ($encoding ?: "Unknown");
?>
Performance Note: This scans through all available encodings (~50+). Use only when you have no idea about possible encodings, as it's slower than a targeted list.
Recommended encoding lists based on language/region:
$encodings = ["UTF-8", "ISO-8859-1", "Windows-1252", "ASCII"];
$encodings = ["UTF-8", "Windows-1251", "KOI8-R", "ISO-8859-5"];
$encodings = ["UTF-8", "Shift-JIS", "EUC-JP", "ISO-2022-JP"];
$encodings = ["UTF-8", "GB2312", "GBK", "GB18030"];
$encodings = ["UTF-8", "ISO-8859-1", "Windows-1252", "Windows-1251",
"Shift-JIS", "EUC-JP", "GB2312", "BIG5", "ASCII"];
| Feature | Description |
|---|---|
| Detects multiple encodings | Pass an array of possible encodings in order of likelihood |
| Supports strict mode | Use true for precise detection (fewer false positives) |
| Works with file input | Detect encoding from file content before processing |
| Helps convert text to UTF-8 | Use with mb_convert_encoding() for standardization |
| Can check all available encodings | Use mb_list_encodings() as fallback |
| Best for multilingual apps | Essential for handling user input, files, APIs with mixed encodings |
["UTF-8", "ISO-8859-1", "Windows-1252"] for Western languages. Always convert detected text to UTF-8 early in your processing pipeline for consistency.
Now you know how to effectively detect and handle multiple character encodings in PHP!