PHP UTF-8 Input


  • You should verify every received string as being valid UTF-8 before you try to store it or use it anywhere. PHP's mb_check_encoding() does the trick, but you have to use it consistently. There's really no way around this, as malicious clients can submit data in whatever encoding they want.

    $string = $_REQUEST['user_comment'];
    if (!mb_check_encoding($string, 'UTF-8')) {
        // the string is not UTF-8, so re-encode it.
        $actualEncoding = mb_detect_encoding($string);
        $string = mb_convert_encoding($string, 'UTF-8', $actualEncoding);
  • If you're using HTML5 then you can ignore this last point. You want all data sent to you by browsers to be in UTF-8. The only reliable way to do this is to add the accept-charset attribute to all of your <form> tags like so:

    <form action="somepage.php" accept-charset="UTF-8">