Perl Language Tutorial => The utf8 pragma: using Unicode in your...

Example

The utf8 pragma indicates that the source code will be interpreted as UTF-8. Of course, this will only work if your text editor is also saving the source as UTF-8 encoded.

Now, string literals can contain arbitrary Unicode characters; identifiers can also contain Unicode but only word-like characters (see perldata and perlrecharclass for more information):

use utf8;
my $var1 = '§я§©😄';      # works fine
my $я = 4;                # works since я is a word (matches \w) character
my $p§2 = 3;              # does not work since § is not a word character.
say "ya" if $var1 =~ /я§/; # works fine (prints "ya")

Note: When printing text to the terminal, make sure it supports UTF-8.*

There may be complex and counter-intuitive relationships between output and source encoding. Running on a UTF-8 terminal, you may find that adding the utf8 pragma seems to break things:

$ perl -e 'print "Møøse\n"'
Møøse
$ perl -Mutf8 -e 'print "Møøse\n"'
M��se
$ perl -Mutf8 -CO -e 'print "Møøse\n"'
Møøse

In the first case, Perl treats the string as raw bytes and prints them like that. As these bytes happen to be valid UTF-8, they look correct even though Perl doesn't really know what characters they are (e.g. length("Møøse") will return 7, not 5). Once you add -Mutf8, Perl correctly decodes the UTF-8 source to characters, but output is in Latin-1 mode by default and printing Latin-1 to a UTF-8 terminal doesn't work. Only when you switch STDOUT to UTF-8 using -CO will the output be correct.

use utf8 doesn't affect standard I/O encoding nor file handles!

PDF - Download Perl Language for free

Previous Next

Perl Language

Fastest Entity Framework Extensions

Example

Got any Perl Language Question?

Perl Language

Perl Language Unicode The utf8 pragma: using Unicode in your sources

Fastest Entity Framework Extensions

Example

Got any Perl Language Question?