本文内容整理自: http://code.google.com/p/re2/wiki/Syntax
Single characters¶
. | any character, possibly including newline (s=true) |
[xyz] | character class |
[^xyz] | negated character class |
\d | Perl character class |
\D | negated Perl character class |
[[:alpha:]] | ASCII character class |
[[:^alpha:]] | negated ASCII character class |
\pN | Unicode character class (one-letter name) |
\p{Greek} | Unicode character class |
\PN | negated Unicode character class (one-letter name) |
\P{Greek} | negated Unicode character class |
Composites¶
xy | x followed by y |
x|y | x or y (prefer x) |
Repetitions¶
x* | zero or more x, prefer more |
x+ | one or more x, prefer more |
x? | zero or one x, prefer one |
x{n,m} | n or n+1 or ... or m x, prefer more |
x{n,} | n or more x, prefer more |
x{n} | exactly n x |
x*? | zero or more x, prefer fewer |
x+? | one or more x, prefer fewer |
x?? | zero or one x, prefer zero |
x{n,m}? | n or n+1 or ... or m x, prefer fewer |
x{n,}? | n or more x, prefer fewer |
x{n}? | exactly n x |
x{} | (≡ x* ) (NOT SUPPORTED) VIM |
x{-} | (≡ x*? ) (NOT SUPPORTED) VIM |
x{-n} | (≡ x{n}? ) (NOT SUPPORTED) VIM |
x= | (≡ x? ) (NOT SUPPORTED) VIM |
Implementation restriction: The counting forms x{n,m} , x{n,} , and x{n} reject forms that create a minimum or maximum repetition count above 1000. Unlimited repetitions are not subject to this restriction.
Possessive repetitions¶
x*+ | zero or more x, possessive (NOT SUPPORTED) |
x++ | one or more x, possessive (NOT SUPPORTED) |
x?+ | zero or one x, possessive (NOT SUPPORTED) |
x{n,m}+ | n or ... or m x, possessive (NOT SUPPORTED) |
x{n,}+ | n or more x, possessive (NOT SUPPORTED) |
x{n}+ | exactly n x, possessive (NOT SUPPORTED) |
Grouping¶
(re) | numbered capturing group (submatch) |
(?P<name>re) | named & numbered capturing group (submatch) |
(?<name>re) | named & numbered capturing group (submatch) (NOT SUPPORTED) |
(?'name're) | named & numbered capturing group (submatch) (NOT SUPPORTED) |
(?:re) | non-capturing group |
(?flags) | set flags within current group; non-capturing |
(?flags:re) | set flags during re; non-capturing |
(?#text) | comment (NOT SUPPORTED) |
(?|x|y|z) | branch numbering reset (NOT SUPPORTED) |
(?>re) | possessive match of re (NOT SUPPORTED) |
re@> | possessive match of re (NOT SUPPORTED) VIM |
%(re) | non-capturing group (NOT SUPPORTED) VIM |
Flags¶
i | case-insensitive (default false) |
m | multi-line mode: ^ and $ match begin/end line in addition to begin/end text (default false) |
s | let . match \n (default false) |
U | ungreedy: swap meaning of x* and x*?, x+ and x+?, etc (default false) |
Flag syntax is xyz (set) or -xyz (clear) or xy-z (set xy, clear z).
Empty strings¶
^ | at beginning of text or line (m=true) |
$ | at end of text (like \z not \Z) or line (m=true) |
\A | at beginning of text |
\b | at ASCII word boundary (\w on one side and \W , \A , or \z on the other) |
\B | not at ASCII word boundary |
\G | at beginning of subtext being searched (NOT SUPPORTED) PCRE |
\G | at end of last match (NOT SUPPORTED) PERL |
\Z | at end of text, or before newline at end of text (NOT SUPPORTED) |
\z | at end of text |
(?=re) | before text matching re (NOT SUPPORTED) |
(?!re) | before text not matching re (NOT SUPPORTED) |
(?<=re) | after text matching re (NOT SUPPORTED) |
(?<!re) | after text not matching re (NOT SUPPORTED) |
re& | before text matching re (NOT SUPPORTED) VIM |
re@= | before text matching re (NOT SUPPORTED) VIM |
re@! | before text not matching re (NOT SUPPORTED) VIM |
re@<= | after text matching re (NOT SUPPORTED) VIM |
re@<! | after text not matching re (NOT SUPPORTED) VIM |
\zs | sets start of match (= \K ) (NOT SUPPORTED) VIM |
\ze | sets end of match (NOT SUPPORTED) VIM |
\%^ | beginning of file (NOT SUPPORTED) VIM |
\%$ | end of file (NOT SUPPORTED) VIM |
\%V | on screen (NOT SUPPORTED) VIM |
\%# | cursor position (NOT SUPPORTED) VIM |
\%'m | mark m position (NOT SUPPORTED) VIM |
\%23l | in line 23 (NOT SUPPORTED) VIM |
\%23c | in column 23 (NOT SUPPORTED) VIM |
\%23v | in virtual column 23 (NOT SUPPORTED) VIM |
Escape sequences¶
\a | bell (≡ \007 ) |
\f | form feed (≡ \014 ) |
\t | horizontal tab (≡ \011 ) |
\n | newline (≡ \012 ) |
\r | carriage return (≡ \015 ) |
\v | vertical tab character (≡ \013 ) |
\* | literal * , for any punctuation character * |
\123 | octal character code (up to three digits) |
\x7F | hex character code (exactly two digits) |
\x{10FFFF} | hex character code |
\C | match a single byte even in UTF-8 mode |
\Q...\E | literal text ... even if ... has punctuation |
\1 | backreference (NOT SUPPORTED) |
\b | backspace (NOT SUPPORTED) (use \010 ) |
\cK | control char ^K (NOT SUPPORTED) (use \001 etc) |
\e | escape (NOT SUPPORTED) (use \033) |
\g1 | backreference (NOT SUPPORTED) |
\g{1} | backreference (NOT SUPPORTED) |
\g{+1} | backreference (NOT SUPPORTED) |
\g{-1} | backreference (NOT SUPPORTED) |
\g{name} | named backreference (NOT SUPPORTED) |
\g<name> | subroutine call (NOT SUPPORTED) |
\g'name' | subroutine call (NOT SUPPORTED) |
\k<name> | named backreference (NOT SUPPORTED) |
\k'name' | named backreference (NOT SUPPORTED) |
\lX | lowercase X (NOT SUPPORTED) |
\ux | uppercase x (NOT SUPPORTED) |
\L...\E | lowercase text ... (NOT SUPPORTED) |
\K | reset beginning of $0 (NOT SUPPORTED) |
\N{name} | named Unicode character (NOT SUPPORTED) |
\R | line break (NOT SUPPORTED) |
\U...\E | upper case text ... (NOT SUPPORTED) |
\X | extended Unicode sequence (NOT SUPPORTED) |
\%d123 | decimal character 123 (NOT SUPPORTED) VIM |
\%xFF | hex character FF (NOT SUPPORTED) VIM |
\%o123 | octal character 123 (NOT SUPPORTED) VIM |
\%u1234 | Unicode character 0x1234 (NOT SUPPORTED) VIM |
\%U12345678 | Unicode character 0x12345678 (NOT SUPPORTED) VIM |
Character class elements¶
x | single character |
A-Z | character range (inclusive) |
\d | Perl character class |
[:foo:] | ASCII character class foo |
\p{Foo} | Unicode character class Foo |
\pF | Unicode character class F (one-letter name) |
Named character classes as character class elements¶
[\d] | digits (≡ \d ) |
[^\d] | not digits (≡ \D ) |
[\D] | not digits (≡ \D ) |
[^\D] | not not digits (≡ \d ) |
[[:name:]] | named ASCII class inside character class (≡ [:name:] ) |
[^[:name:]] | named ASCII class inside negated character class (≡ [:^name:] ) |
[\p{Name}] | named Unicode property inside character class (≡ \p{Name} ) |
[^\p{Name}] | named Unicode property inside negated character class (≡ \P{Name} ) |
Perl character classes (all ASCII-only)¶
\d | digits (≡ [0-9] ) |
\D | not digits (≡ [^0-9] ) |
\s | whitespace (≡ [\t\n\f\r ] ) |
\S | not whitespace (≡ [^\t\n\f\r ] ) |
\w | word characters (≡ [0-9A-Za-z_] ) |
\W | not word characters (≡ [^0-9A-Za-z_] ) |
\h | horizontal space (NOT SUPPORTED) |
\H | not horizontal space (NOT SUPPORTED) |
\v | vertical space (NOT SUPPORTED) |
\V | not vertical space (NOT SUPPORTED) |
ASCII character classes¶
[[:alnum:]] | alphanumeric (≡ [0-9A-Za-z] ) |
[[:alpha:]] | alphabetic (≡ [A-Za-z] ) |
[[:ascii:]] | ASCII (≡ [\x00-\x7F] ) |
[[:blank:]] | blank (≡ [\t ] ) |
[[:cntrl:]] | control (≡ [\x00-\x1F\x7F] ) |
[[:digit:]] | digits (≡ [0-9] ) |
[[:graph:]] | graphical (≡ [!-~] == [A-Za-z0-9!"#$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~] ) |
[[:lower:]] | lower case (≡ [a-z] ) |
[[:print:]] | printable (≡ [ -~] == [ [:graph:]] ) |
[[:punct:]] | punctuation (≡ [!-/:-@[-`{-~] ) |
[[:space:]] | whitespace (≡ [\t\n\v\f\r ] ) |
[[:upper:]] | upper case (≡ [A-Z] ) |
[[:word:]] | word characters (≡ [0-9A-Za-z_] ) |
[[:xdigit:]] | hex digit (≡ [0-9A-Fa-f] ) |
Unicode character class names--general category¶
C | other |
Cc | control |
Cf | format |
Cn | unassigned code points (NOT SUPPORTED) |
Co | private use |
Cs | surrogate |
L | letter |
LC | cased letter (NOT SUPPORTED) |
L& | cased letter (NOT SUPPORTED) |
Ll | lowercase letter |
Lm | modifier letter |
Lo | other letter |
Lt | titlecase letter |
Lu | uppercase letter |
M | mark |
Mc | spacing mark |
Me | enclosing mark |
Mn | non-spacing mark |
N | number |
Nd | decimal number |
Nl | letter number |
No | other number |
P | punctuation |
Pc | connector punctuation |
Pd | dash punctuation |
Pe | close punctuation |
Pf | final punctuation |
Pi | initial punctuation |
Po | other punctuation |
Ps | open punctuation |
S | symbol |
Sc | currency symbol |
Sk | modifier symbol |
Sm | math symbol |
So | other symbol |
Z | separator |
Zl | line separator |
Zp | paragraph separator |
Zs | space separator |
Unicode character class names--scripts¶
Arabic | Arabic |
Armenian | Armenian |
Balinese | Balinese |
Bamum | Bamum |
Batak | Batak |
Bengali | Bengali |
Bopomofo | Bopomofo |
Brahmi | Brahmi |
Braille | Braille |
Buginese | Buginese |
Buhid | Buhid |
Canadian_Aboriginal | Canadian Aboriginal |
Carian | Carian |
Chakma | Chakma |
Cham | Cham |
Cherokee | Cherokee |
Common | characters not specific to one script |
Coptic | Coptic |
Cuneiform | Cuneiform |
Cypriot | Cypriot |
Cyrillic | Cyrillic |
Deseret | Deseret |
Devanagari | Devanagari |
Egyptian_Hieroglyphs | Egyptian Hieroglyphs |
Ethiopic | Ethiopic |
Georgian | Georgian |
Glagolitic | Glagolitic |
Gothic | Gothic |
Greek | Greek |
Gujarati | Gujarati |
Gurmukhi | Gurmukhi |
Han | Han |
Hangul | Hangul |
Hanunoo | Hanunoo |
Hebrew | Hebrew |
Hiragana | Hiragana |
Imperial_Aramaic | Imperial Aramaic |
Inherited | inherit script from previous character |
Inscriptional_Pahlavi | Inscriptional Pahlavi |
Inscriptional_Parthian | Inscriptional Parthian |
Javanese | Javanese |
Kaithi | Kaithi |
Kannada | Kannada |
Katakana | Katakana |
Kayah_Li | Kayah Li |
Kharoshthi | Kharoshthi |
Khmer | Khmer |
Lao | Lao |
Latin | Latin |
Lepcha | Lepcha |
Limbu | Limbu |
Linear_B | Linear B |
Lycian | Lycian |
Lydian | Lydian |
Malayalam | Malayalam |
Mandaic | Mandaic |
Meetei_Mayek | Meetei Mayek |
Meroitic_Cursive | Meroitic Cursive |
Meroitic_Hieroglyphs | Meroitic Hieroglyphs |
Miao | Miao |
Mongolian | Mongolian |
Myanmar | Myanmar |
New_Tai_Lue | New Tai Lue (aka Simplified Tai Lue) |
Nko | Nko |
Ogham | Ogham |
Ol_Chiki | Ol Chiki |
Old_Italic | Old Italic |
Old_Persian | Old Persian |
Old_South_Arabian | Old South Arabian |
Old_Turkic | Old Turkic |
Oriya | Oriya |
Osmanya | Osmanya |
Phags_Pa | Phags Pa |
Phoenician | Phoenician |
Rejang | Rejang |
Runic | Runic |
Saurashtra | Saurashtra |
Sharada | Sharada |
Shavian | Shavian |
Sinhala | Sinhala |
Sora_Sompeng | Sora Sompeng |
Sundanese | Sundanese |
Syloti_Nagri | Syloti Nagri |
Syriac | Syriac |
Tagalog | Tagalog |
Tagbanwa | Tagbanwa |
Tai_Le | Tai Le |
Tai_Tham | Tai Tham |
Tai_Viet | Tai Viet |
Takri | Takri |
Tamil | Tamil |
Telugu | Telugu |
Thaana | Thaana |
Thai | Thai |
Tibetan | Tibetan |
Tifinagh | Tifinagh |
Ugaritic | Ugaritic |
Vai | Vai |
Yi | Yi |
Vim character classes¶
\i | identifier character (NOT SUPPORTED) VIM |
\I | \i except digits (NOT SUPPORTED) VIM |
\k | keyword character (NOT SUPPORTED) VIM |
\K | \k except digits (NOT SUPPORTED) VIM |
\f | file name character (NOT SUPPORTED) VIM |
\F | \f except digits (NOT SUPPORTED) VIM |
\p | printable character (NOT SUPPORTED) VIM |
\P | \p except digits (NOT SUPPORTED) VIM |
\s | whitespace character ( ≡ [ \t] ) (NOT SUPPORTED) VIM |
\S | non-white space character ( ≡ [^ \t] ) (NOT SUPPORTED) VIM |
\d | digits ( ≡ [0-9] ) VIM |
\D | not \d VIM |
\x | hex digits ( ≡ [0-9A-Fa-f] ) (NOT SUPPORTED) VIM |
\X | not \x (NOT SUPPORTED) VIM |
\o | octal digits ( ≡ [0-7] ) (NOT SUPPORTED) VIM |
\O | not \o (NOT SUPPORTED) VIM |
\w | word character VIM |
\W | not \w VIM |
\h | head of word character (NOT SUPPORTED) VIM |
\H | not \h (NOT SUPPORTED) VIM |
\a | alphabetic (NOT SUPPORTED) VIM |
\A | not \a (NOT SUPPORTED) VIM |
\l | lowercase (NOT SUPPORTED) VIM |
\L | not lowercase (NOT SUPPORTED) VIM |
\u | uppercase (NOT SUPPORTED) VIM |
\U | not uppercase (NOT SUPPORTED) VIM |
\_x | \x plus newline, for any x (NOT SUPPORTED) VIM |
Vim flags¶
\c | ignore case (NOT SUPPORTED) VIM |
\C | match case (NOT SUPPORTED) VIM |
\m | magic (NOT SUPPORTED) VIM |
\M | nomagic (NOT SUPPORTED) VIM |
\v | verymagic (NOT SUPPORTED) VIM |
\V | verynomagic (NOT SUPPORTED) VIM |
\Z | ignore differences in Unicode combining characters (NOT SUPPORTED) VIM |
Magic¶
(?{code}) | arbitrary Perl code (NOT SUPPORTED) PERL |
(??{code}) | postponed arbitrary Perl code (NOT SUPPORTED) PERL |
(?n) | recursive call to regexp capturing group n (NOT SUPPORTED) |
(?+n) | recursive call to relative group +n (NOT SUPPORTED) |
(?-n) | recursive call to relative group -n (NOT SUPPORTED) |
(?C) | PCRE callout (NOT SUPPORTED) PCRE |
(?R) | recursive call to entire regexp (≡ (?0) ) (NOT SUPPORTED) |
(?&name) | recursive call to named group (NOT SUPPORTED) |
(?P=name) | named backreference (NOT SUPPORTED) |
(?P>name) | recursive call to named group (NOT SUPPORTED) |
(?(cond)true|false) | conditional branch (NOT SUPPORTED) |
(?(cond)true) | conditional branch (NOT SUPPORTED) |
(*ACCEPT) | make regexps more like Prolog (NOT SUPPORTED) |
(*COMMIT) | (NOT SUPPORTED) |
(*F) | (NOT SUPPORTED) |
(*FAIL) | (NOT SUPPORTED) |
(*MARK) | (NOT SUPPORTED) |
(*PRUNE) | (NOT SUPPORTED) |
(*SKIP) | (NOT SUPPORTED) |
(*THEN) | (NOT SUPPORTED) |
(*ANY) | set newline convention (NOT SUPPORTED) |
(*ANYCRLF) | (NOT SUPPORTED) |
(*CR) | (NOT SUPPORTED) |
(*CRLF) | (NOT SUPPORTED) |
(*LF) | (NOT SUPPORTED) |
(*BSR_ANYCRLF) | set \R convention (NOT SUPPORTED) PCRE |
(*BSR_UNICODE) | (NOT SUPPORTED) PCRE |
Comments