mb_detect_encoding in PHP – Detect Character Encoding

Table of Contents: Syntax of mb_detect_encoding Detect Encoding of a String Detect Encoding from Multiple Encodings Convert Encoding After Detection Common Use Cases Handling Unknown Encodings Summary of mb_detect_encoding()

PHP Character Encoding Detection with mb_detect_encoding

The mb_detect_encoding() function in PHP is used to detect the character encoding of a string. This is essential when working with multibyte character sets (UTF-8, ISO-8859-1, Shift-JIS, etc.), especially when handling user input, files, or database content from different sources.

PHP Extension Required: This function is part of the mbstring (Multibyte String) extension. Ensure it's enabled in your PHP installation: extension=mbstring in your php.ini file.

Syntax of mb_detect_encoding

mb_detect_encoding(string $string, array|string $encodings = null, bool $strict = false): string|false

$string – The string to check for character encoding.
$encodings (optional) – A list of possible encodings to check against. Can be a comma-separated string or an array (e.g., 'UTF-8, ISO-8859-1' or ['UTF-8', 'ISO-8859-1']). If null, it uses mb_detect_order().
$strict (optional) – If true, it performs stricter encoding detection (slower but more accurate).

Returns: The detected encoding name (e.g., "UTF-8", "ISO-8859-1") or false if detection fails.

Detect Encoding of a String

Basic Detection Example

Detect the encoding of a string with Russian text:

<?php
$string = "Привет, мир!"; // Russian text
$encoding = mb_detect_encoding($string, "UTF-8, ISO-8859-1, Windows-1251");

echo "Detected encoding: " . ($encoding ?: "Unknown");
?>

Output: Detected encoding: UTF-8

Detect Encoding from Multiple Encodings

You can specify a prioritized list of encodings to check. The function returns the first match from your list.

Checking Against Specific Encodings

<?php
$text = "Hello World";  // Text with accented characters
$encodings = ["UTF-8", "ISO-8859-1", "Windows-1252"];
$detected = mb_detect_encoding($text, $encodings, true); // Strict mode

echo $detected ? "Encoding: $detected" : "Encoding not detected";
?>

Possible Output: Encoding: UTF-8

Important: The $encodings parameter order matters! PHP checks encodings in the order you provide. Always list the most likely encoding first (usually UTF-8 for modern applications).

Convert Encoding After Detection

Once you've detected the encoding, you can convert the string to your desired encoding (usually UTF-8) using mb_convert_encoding().

Detect and Convert to UTF-8

<?php
// Simulate reading content that might be in a different encoding
$string = file_get_contents("legacy_data.txt");

// Detect the current encoding
$encoding = mb_detect_encoding($string, ["UTF-8", "ISO-8859-1", "Windows-1252"]);

if ($encoding && $encoding !== "UTF-8") {
    // Convert to UTF-8
    $string = mb_convert_encoding($string, "UTF-8", $encoding);
    echo "Converted from $encoding to UTF-8.";
} else {
    echo "String is already UTF-8 or encoding detection failed.";
}

// Now $string is (hopefully) in UTF-8
echo "Processed string: " . $string;
?>

Common Use Cases

Fixing character encoding issues in databases – When importing data from legacy systems.
Handling multi-language text input – From web forms or APIs that don't specify encoding.
Converting legacy encoding – e.g., Windows-1252 > UTF-8 for modern web applications.
Processing uploaded files – CSV, text files that could be in various encodings.
Validating user input – Ensuring text is in the expected encoding before processing.

Handling Unknown Encodings

Sometimes mb_detect_encoding() returns false or an incorrect result. Here are workarounds:

Fallback Strategies

1. Force conversion with "auto" detection:

$string = mb_convert_encoding($string, "UTF-8", "auto");

2. Use iconv() with IGNORE option to handle invalid characters:

$detected = mb_detect_encoding($string, mb_list_encodings(), true);
if ($detected) {
    $string = iconv($detected, "UTF-8//IGNORE", $string);
}

3. Check if string is valid UTF-8 before processing:

if (!mb_check_encoding($string, 'UTF-8')) {
    // Not UTF-8, try to detect and convert
    $string = mb_convert_encoding($string, 'UTF-8', 'auto');
}

Summary of mb_detect_encoding()

Feature	Description
Detects text encoding	Returns encoding name like "UTF-8", "ISO-8859-1", or false
Supports multiple encodings	Specify prioritized list: `["UTF-8", "ISO-8859-1"]`
Strict mode available	More accurate detection with `true` (slower)
Handles multibyte characters	Essential for Unicode (UTF-8) and legacy encodings
Works with mb_convert_encoding()	Detect then convert: `mb_convert_encoding($str, "UTF-8", $detected)`
Requires mbstring extension	Enable via `extension=mbstring` in php.ini

Best Practice: For web applications, aim to normalize all text to UTF-8 as early as possible in your processing pipeline. Use mb_detect_encoding() as a tool to identify and convert non-UTF-8 text.

Now you can effectively detect and fix character encoding issues in your PHP applications!

mb_detect_encoding in PHP – Detect Character Encoding

Syntax of mb_detect_encoding

Detect Encoding of a String

Basic Detection Example

Detect Encoding from Multiple Encodings

Checking Against Specific Encodings

Convert Encoding After Detection

Detect and Convert to UTF-8

Common Use Cases

Handling Unknown Encodings

Fallback Strategies

Summary of mb_detect_encoding()

High-Availability Cloud VDS

Hosting

Cloud Solution

For You

VDS

Website Builders

Our services

Client Area

Select Currency

mb_detect_encoding in PHP – Detect Character Encoding

Syntax of mb_detect_encoding

Detect Encoding of a String

Basic Detection Example

Detect Encoding from Multiple Encodings

Checking Against Specific Encodings

Convert Encoding After Detection

Detect and Convert to UTF-8

Common Use Cases

Handling Unknown Encodings

Fallback Strategies

Summary of mb_detect_encoding()

High-Availability Cloud VDS

Hosting

Cloud Solution

For You

VDS

Website Builders

Our services

Client Area

Выберите валюту