Onigmo(Oniguruma-mod)RegularExpressions正则表达Version6.1.02016/12/25syntax:ONIG_SYNTAX_RUBY(default)1.Syntaxelements语法元素o\escape(enableordisablemetacharacter)转义(允许或禁止元字符)o|alternation(可选,或操作)。(..)group(分组)。[..]characterclass(字符集合)2.Characters元字符o\thorizontaltab(水平制表符)(0x09)。lvverticaltab(垂直制表符)(0x0B)o\nnewline(linefeed)(换行符)(0x0A)o\rcarriagereturn(回车符)(0x0D)o\bbackspace(退格)(0x08)o\fformfeed(换页符)(0x0C)o\abell(响铃)(0x07)oleescape(退出)(0x1B)oInnnoctalchar(十进制字符)(encodedbytevalue)。\xHHhexadecimalchar(十六进制字符)(encodedbytevalue)。\x{7HHHHHHH}widehexadecimalchar(宽十六进制字符)(charactercodepointvalue)oluHHHHwidehexadecimalchar(宽十六进制字符)(charactercodepointvalue)o\cxcontrolchar(控制字符)(charactercodepointvalue)。\C-xcontrolchar(控制字符)(charactercodepointvalue)。\M-xmeta(x|0x80)(元)(charactercodepointvalue)。\M-\C-xmetacontrolchar(元控制字符)(charactercodepointvalue)(*\basbackspaceiseffectiveincharacterclassonly,只有在字符集的情况下,\b才能使用)ONIG_SYNTAX_PERL:\o{nnn}(octalchar十进制字符)canbealsoused也可以使用.3.Charactertypes字符类型0anycharacter(exceptnewline)任意字符(除了换行符)oIwwordcharacter单词字符1NotUnicode非Unicode的情形:alphanumericand"_".此时,\w匹配字母、数字和下划线字、标点、连接符号),不包含中文等其他字符34·ItdependsonONIGOPTIONASCIIRANGEoptionthatnon-ASCIIcharincludesornot.是否包含“非ASCII码”,取决于选项ONIG_OPTION_ASCII_RANGE.oIWnon-wordchar非单词字符oIswhitespacechar空白字符NotUnicode非Unicode的情形:\t,\n,\v,\f,\r,\x20包含水平制表符、换行符、垂直制表符、换页符、回车符2·Unicode在Unicode的情形:0009,000A,000B,000C,000D,0085(NEL),3GeneralCategory--LineSeparator4--ParagraphSeparator5--SpaceSeparator6ItdependsonONIGOPTIONASCIIRANGEoptionthatnon-ASCIIcharincludesornot.是否包含“非ASCII码”,取决于选项ONIGOPTIONASCIIRANGE.oISnon-whitespacechar非空白字符o\ddecimaldigitchar十进制的数字1Unicode:General_Category--Decimal_Number23·ItdependsonONIGOPTIONASCIIRANGEoptionthatnon-ASCIIcharincludesornot.。\Dnon-decimal-digitchar非十进制的数字o\hhexadecimal-digitchar[0-9a-fA-F]十六进制的数字o\Hnon-hexadecimal-digitchar非十六进制的数字CharacterProperty字符属性,类似于POSIX字符集:\p{digit}等同于[[:digit:]]*\p{property-name}使用\p{属性名},可用属性名见下方,\p和\P为相反。*\p{^property-name}(negative)小写的\p{^pattern}等于大写的\P{pattern}*\P{property-name}(negative)property-name可用属性名:12345+worksonallencodingsAlnum,Alpha,Blank,Cntrl,Digit,Graph,Lower,Print,Punct,Space,Upper,XDigit,Word,ASCII+worksonEUCJP,ShiftJIS,CP932,Hiragana,Katakana,Han,Latin,Greek,Cyrillic+worksonUTF-8,UTF-16,UTF-32seeUnicodeProps.txt2Unicode在Unicode的情形:(Letter|Mark|Number|Connector_Punctuation)此时,\w匹配内容(包含字母、小写字母字符:[a-z]大写字母字符:[A-Z]所有ASCII:[\x00-\x7F]字母字符:[\p{Lower}\p{Upper}]十进制数字:[0-9]字母数字字符:[\p{Alpha}\p{Digit}]标点符号:!"#$号&'()*+,-./:;<=>?@[\]^_`{1}~可见字符:[\p(Alnum}\p{Punct}]可打印字符:[\p{Graph}\x20]空格或制表符:[\t]控制字符:[\x00-\x1F\x7F]十六进制数字:[0-9a-fA-F]空白字符:[\t\n\x0B\f\r]所有希腊字符,比如α,β,y,2.....大写字母(开启区分大小写)货币符号(比如$,c,¥...)所有字符,除去希腊字符(注意P是大写的)UnicodeProperty:字符属于标点、空格、字母等等。每个Unicode字符只能属于唯一UnicodeProperty。.NET、Java、PHP和Ruby等语言支持。具体分类为:字符\p{L}\p{L1}或\p{Lowercas...