正则表达式匹配汉字

· Read in about 1 min · (109 Words)

本文介绍如何在Perl中用正则表达式匹配汉字。

这是根据@灰灰 的建议的最新一次修改版本(2014-04-22)。

注:下列代码保存于UTF8编码的文本文件中,且命令行终端为UTF8编码。

#!/usr/bin/env perl

use strict;
use warnings;

use utf8::all;

# use utf8::all equal to:
#
#     use utf8;
#     binmode( STDIN,  ':encoding(utf8)' );
#     binmode( STDOUT, ':encoding(utf8)' );
#     binmode( STDERR, ':encoding(utf8)' );

my $s = "床前明月光,疑是地上霜。123我abcβ";

print "原字符串: $s\n";

print "所有字符: ", join( "|", split( //, $s ) ), "\n";

# Find east asia characters
# See more: http://perldoc.perl.org/perluniprops.html
my @ea = ();
while ( $s =~ /(\p{Ea=W})/g ) {
    push @ea, $1;
}
print "所有汉字: ", join( "|", @ea ), "\n";

# Others:
#
#     Encode::CN - China-based Chinese Encodings
#     Encode::TW - Traditional Chinese Encodings

输出

原字符串: 床前明月光,疑是地上霜。123我abcβ
所有字符: 床|前|明|月|光|,|疑|是|地|上|霜|。|1|2|3|我|a|b|c|β
所有汉字: 床|前|明|月|光|疑|是|地|上|霜|。|我

如果是Golang,则非常简单,\p{Han}即可,见这里