The mb_detect_encoding() function in PHP is used to detect the character encoding of a string. This is essential when working with multibyte character sets (UTF-8, ISO-8859-1, Shift-JIS, etc.), especially when handling user input, files, or database content from different sources.
extension=mbstring in your php.ini file.
mb_detect_encoding(string $string, array|string $encodings = null, bool $strict = false): string|false
'UTF-8, ISO-8859-1' or ['UTF-8', 'ISO-8859-1']). If null, it uses mb_detect_order().true, it performs stricter encoding detection (slower but more accurate).Returns: The detected encoding name (e.g., "UTF-8", "ISO-8859-1") or false if detection fails.
Detect the encoding of a string with Russian text:
<?php
$string = "Привет, мир!"; // Russian text
$encoding = mb_detect_encoding($string, "UTF-8, ISO-8859-1, Windows-1251");
echo "Detected encoding: " . ($encoding ?: "Unknown");
?>
Output: Detected encoding: UTF-8
You can specify a prioritized list of encodings to check. The function returns the first match from your list.
<?php
$text = "Hello World"; // Text with accented characters
$encodings = ["UTF-8", "ISO-8859-1", "Windows-1252"];
$detected = mb_detect_encoding($text, $encodings, true); // Strict mode
echo $detected ? "Encoding: $detected" : "Encoding not detected";
?>
Possible Output: Encoding: UTF-8
$encodings parameter order matters! PHP checks encodings in the order you provide. Always list the most likely encoding first (usually UTF-8 for modern applications).
Once you've detected the encoding, you can convert the string to your desired encoding (usually UTF-8) using mb_convert_encoding().
<?php
// Simulate reading content that might be in a different encoding
$string = file_get_contents("legacy_data.txt");
// Detect the current encoding
$encoding = mb_detect_encoding($string, ["UTF-8", "ISO-8859-1", "Windows-1252"]);
if ($encoding && $encoding !== "UTF-8") {
// Convert to UTF-8
$string = mb_convert_encoding($string, "UTF-8", $encoding);
echo "Converted from $encoding to UTF-8.";
} else {
echo "String is already UTF-8 or encoding detection failed.";
}
// Now $string is (hopefully) in UTF-8
echo "Processed string: " . $string;
?>
Sometimes mb_detect_encoding() returns false or an incorrect result. Here are workarounds:
1. Force conversion with "auto" detection:
$string = mb_convert_encoding($string, "UTF-8", "auto");
2. Use iconv() with IGNORE option to handle invalid characters:
$detected = mb_detect_encoding($string, mb_list_encodings(), true);
if ($detected) {
$string = iconv($detected, "UTF-8//IGNORE", $string);
}
3. Check if string is valid UTF-8 before processing:
if (!mb_check_encoding($string, 'UTF-8')) {
// Not UTF-8, try to detect and convert
$string = mb_convert_encoding($string, 'UTF-8', 'auto');
}
| Feature | Description |
|---|---|
| Detects text encoding | Returns encoding name like "UTF-8", "ISO-8859-1", or false |
| Supports multiple encodings | Specify prioritized list: ["UTF-8", "ISO-8859-1"] |
| Strict mode available | More accurate detection with true (slower) |
| Handles multibyte characters | Essential for Unicode (UTF-8) and legacy encodings |
| Works with mb_convert_encoding() | Detect then convert: mb_convert_encoding($str, "UTF-8", $detected) |
| Requires mbstring extension | Enable via extension=mbstring in php.ini |
mb_detect_encoding() as a tool to identify and convert non-UTF-8 text.
Now you can effectively detect and fix character encoding issues in your PHP applications!