本文介绍如何在Perl中用正则表达式匹配汉字。
这是根据@灰灰 的建议的最新一次修改版本(2014-04-22)。
注:下列代码保存于UTF8编码的文本文件中,且命令行终端为UTF8编码。
#!/usr/bin/env perl
use strict;
use warnings;
use utf8::all;
# use utf8::all equal to:
#
# use utf8;
# binmode( STDIN, ':encoding(utf8)' );
# binmode( STDOUT, ':encoding(utf8)' );
# binmode( STDERR, ':encoding(utf8)' );
my $s = "床前明月光,疑是地上霜。123我abcβ";
print "原字符串: $s\n";
print "所有字符: ", join( "|", split( //, $s ) ), "\n";
# Find east asia characters
# See more: http://perldoc.perl.org/perluniprops.html
my @ea = ();
while ( $s =~ /(\p{Ea=W})/g ) {
push @ea, $1;
}
print "所有汉字: ", join( "|", @ea ), "\n";
# Others:
#
# Encode::CN - China-based Chinese Encodings
# Encode::TW - Traditional Chinese Encodings
输出
原字符串: 床前明月光,疑是地上霜。123我abcβ
所有字符: 床|前|明|月|光|,|疑|是|地|上|霜|。|1|2|3|我|a|b|c|β
所有汉字: 床|前|明|月|光|疑|是|地|上|霜|。|我
如果是Golang,则非常简单,\p{Han}
即可,见这里