[go]regexp 包所支持的正则表达式语法

本文内容整理自: http://code.google.com/p/re2/wiki/Syntax

Single characters¶

`.`	any character, possibly including newline (s=true)
`[xyz]`	character class
`[^xyz]`	negated character class
`\d`	Perl character class
`\D`	negated Perl character class
`[[:alpha:]]`	ASCII character class
`[[:^alpha:]]`	negated ASCII character class
`\pN`	Unicode character class (one-letter name)
`\p{Greek}`	Unicode character class
`\PN`	negated Unicode character class (one-letter name)
`\P{Greek}`	negated Unicode character class

Composites¶

`xy`	x followed by y
`x\|y`	x or y (prefer x)

Repetitions¶

`x*`	zero or more x, prefer more
`x+`	one or more x, prefer more
`x?`	zero or one x, prefer one
`x{n,m}`	n or n+1 or ... or m x, prefer more
`x{n,}`	n or more x, prefer more
`x{n}`	exactly n x
`x*?`	zero or more x, prefer fewer
`x+?`	one or more x, prefer fewer
`x??`	zero or one x, prefer zero
`x{n,m}?`	n or n+1 or ... or m x, prefer fewer
`x{n,}?`	n or more x, prefer fewer
`x{n}?`	exactly n x
`x{}`	(≡ `x*` ) (NOT SUPPORTED) VIM
`x{-}`	(≡ `x*?` ) (NOT SUPPORTED) VIM
`x{-n}`	(≡ `x{n}?` ) (NOT SUPPORTED) VIM
`x=`	(≡ `x?` ) (NOT SUPPORTED) VIM

Implementation restriction: The counting forms x{n,m} , x{n,} , and x{n} reject forms that create a minimum or maximum repetition count above 1000. Unlimited repetitions are not subject to this restriction.

Possessive repetitions¶

`x*+`	zero or more x, possessive (NOT SUPPORTED)
`x++`	one or more x, possessive (NOT SUPPORTED)
`x?+`	zero or one x, possessive (NOT SUPPORTED)
`x{n,m}+`	n or ... or m x, possessive (NOT SUPPORTED)
`x{n,}+`	n or more x, possessive (NOT SUPPORTED)
`x{n}+`	exactly n x, possessive (NOT SUPPORTED)

Grouping¶

`(re)`	numbered capturing group (submatch)
`(?P<name>re)`	named & numbered capturing group (submatch)
`(?<name>re)`	named & numbered capturing group (submatch) (NOT SUPPORTED)
`(?'name're)`	named & numbered capturing group (submatch) (NOT SUPPORTED)
`(?:re)`	non-capturing group
`(?flags)`	set flags within current group; non-capturing
`(?flags:re)`	set flags during re; non-capturing
`(?#text)`	comment (NOT SUPPORTED)
`(?\|x\|y\|z)`	branch numbering reset (NOT SUPPORTED)
`(?>re)`	possessive match of re (NOT SUPPORTED)
`re@>`	possessive match of re (NOT SUPPORTED) VIM
`%(re)`	non-capturing group (NOT SUPPORTED) VIM

Flags¶

`i`	case-insensitive (default false)
`m`	multi-line mode: `^` and `$` match begin/end line in addition to begin/end text (default false)
`s`	let `.` match `\n` (default false)
`U`	ungreedy: swap meaning of `x` and `x?`, `x+` and `x+?`, etc (default false)

Flag syntax is xyz (set) or -xyz (clear) or xy-z (set xy, clear z).

Empty strings¶

`^`	at beginning of text or line (m=true)
`$`	at end of text (like `\z` not `\Z`) or line (m=true)
`\A`	at beginning of text
`\b`	at ASCII word boundary (`\w` on one side and `\W` , `\A` , or `\z` on the other)
`\B`	not at ASCII word boundary
`\G`	at beginning of subtext being searched (NOT SUPPORTED) PCRE
`\G`	at end of last match (NOT SUPPORTED) PERL
`\Z`	at end of text, or before newline at end of text (NOT SUPPORTED)
`\z`	at end of text
`(?=re)`	before text matching re (NOT SUPPORTED)
`(?!re)`	before text not matching re (NOT SUPPORTED)
`(?<=re)`	after text matching re (NOT SUPPORTED)
`(?<!re)`	after text not matching re (NOT SUPPORTED)
`re&`	before text matching re (NOT SUPPORTED) VIM
`re@=`	before text matching re (NOT SUPPORTED) VIM
`re@!`	before text not matching re (NOT SUPPORTED) VIM
`re@<=`	after text matching re (NOT SUPPORTED) VIM
`re@<!`	after text not matching re (NOT SUPPORTED) VIM
`\zs`	sets start of match (= `\K` ) (NOT SUPPORTED) VIM
`\ze`	sets end of match (NOT SUPPORTED) VIM
`\%^`	beginning of file (NOT SUPPORTED) VIM
`\%$`	end of file (NOT SUPPORTED) VIM
`\%V`	on screen (NOT SUPPORTED) VIM
`\%#`	cursor position (NOT SUPPORTED) VIM
`\%'m`	mark m position (NOT SUPPORTED) VIM
`\%23l`	in line 23 (NOT SUPPORTED) VIM
`\%23c`	in column 23 (NOT SUPPORTED) VIM
`\%23v`	in virtual column 23 (NOT SUPPORTED) VIM

Escape sequences¶

`\a`	bell (≡ `\007` )
`\f`	form feed (≡ `\014` )
`\t`	horizontal tab (≡ `\011` )
`\n`	newline (≡ `\012` )
`\r`	carriage return (≡ `\015` )
`\v`	vertical tab character (≡ `\013` )
`\*`	literal `` , for any punctuation character ``
`\123`	octal character code (up to three digits)
`\x7F`	hex character code (exactly two digits)
`\x{10FFFF}`	hex character code
`\C`	match a single byte even in UTF-8 mode
`\Q...\E`	literal text ... even if ... has punctuation
`\1`	backreference (NOT SUPPORTED)
`\b`	backspace (NOT SUPPORTED) (use `\010` )
`\cK`	control char `^K` (NOT SUPPORTED) (use `\001` etc)
`\e`	escape (NOT SUPPORTED) (use `\033`)
`\g1`	backreference (NOT SUPPORTED)
`\g{1}`	backreference (NOT SUPPORTED)
`\g{+1}`	backreference (NOT SUPPORTED)
`\g{-1}`	backreference (NOT SUPPORTED)
`\g{name}`	named backreference (NOT SUPPORTED)
`\g<name>`	subroutine call (NOT SUPPORTED)
`\g'name'`	subroutine call (NOT SUPPORTED)
`\k<name>`	named backreference (NOT SUPPORTED)
`\k'name'`	named backreference (NOT SUPPORTED)
`\lX`	lowercase X (NOT SUPPORTED)
`\ux`	uppercase x (NOT SUPPORTED)
`\L...\E`	lowercase text ... (NOT SUPPORTED)
`\K`	reset beginning of `$0` (NOT SUPPORTED)
`\N{name}`	named Unicode character (NOT SUPPORTED)
`\R`	line break (NOT SUPPORTED)
`\U...\E`	upper case text ... (NOT SUPPORTED)
`\X`	extended Unicode sequence (NOT SUPPORTED)
`\%d123`	decimal character 123 (NOT SUPPORTED) VIM
`\%xFF`	hex character FF (NOT SUPPORTED) VIM
`\%o123`	octal character 123 (NOT SUPPORTED) VIM
`\%u1234`	Unicode character `0x1234` (NOT SUPPORTED) VIM
`\%U12345678`	Unicode character `0x12345678` (NOT SUPPORTED) VIM

Character class elements¶

`x`	single character
`A-Z`	character range (inclusive)
`\d`	Perl character class
`[:foo:]`	ASCII character class foo
`\p{Foo}`	Unicode character class Foo
`\pF`	Unicode character class F (one-letter name)

Named character classes as character class elements¶

`[\d]`	digits (≡ `\d` )
`[^\d]`	not digits (≡ `\D` )
`[\D]`	not digits (≡ `\D` )
`[^\D]`	not not digits (≡ `\d` )
`[[:name:]]`	named ASCII class inside character class (≡ `[:name:]` )
`[^[:name:]]`	named ASCII class inside negated character class (≡ `[:^name:]` )
`[\p{Name}]`	named Unicode property inside character class (≡ `\p{Name}` )
`[^\p{Name}]`	named Unicode property inside negated character class (≡ `\P{Name}` )

Perl character classes (all ASCII-only)¶

`\d`	digits (≡ `[0-9]` )
`\D`	not digits (≡ `[^0-9]` )
`\s`	whitespace (≡ `[\t\n\f\r ]` )
`\S`	not whitespace (≡ `[^\t\n\f\r ]` )
`\w`	word characters (≡ `[0-9A-Za-z_]` )
`\W`	not word characters (≡ `[^0-9A-Za-z_]` )
`\h`	horizontal space (NOT SUPPORTED)
`\H`	not horizontal space (NOT SUPPORTED)
`\v`	vertical space (NOT SUPPORTED)
`\V`	not vertical space (NOT SUPPORTED)

ASCII character classes¶

`[[:alnum:]]`	alphanumeric (≡ `[0-9A-Za-z]` )
`[[:alpha:]]`	alphabetic (≡ `[A-Za-z]` )
`[[:ascii:]]`	ASCII (≡ `[\x00-\x7F]` )
`[[:blank:]]`	blank (≡ `[\t ]` )
`[[:cntrl:]]`	control (≡ `[\x00-\x1F\x7F]` )
`[[:digit:]]`	digits (≡ `[0-9]` )
`[[:graph:]]`	graphical (≡ [!-~] == [A-Za-z0-9!"#$%&'()*+,\-./:;<=>?@[\\\]^_`{\|}~] )
`[[:lower:]]`	lower case (≡ `[a-z]` )
`[[:print:]]`	printable (≡ `[ -~]` == `[ [:graph:]]` )
`[[:punct:]]`	punctuation (≡ [!-/:-@[-`{-~] )
`[[:space:]]`	whitespace (≡ `[\t\n\v\f\r ]` )
`[[:upper:]]`	upper case (≡ `[A-Z]` )
`[[:word:]]`	word characters (≡ `[0-9A-Za-z_]` )
`[[:xdigit:]]`	hex digit (≡ `[0-9A-Fa-f]` )

Unicode character class names--general category¶

`C`	other
`Cc`	control
`Cf`	format
`Cn`	unassigned code points (NOT SUPPORTED)
`Co`	private use
`Cs`	surrogate
`L`	letter
`LC`	cased letter (NOT SUPPORTED)
`L&`	cased letter (NOT SUPPORTED)
`Ll`	lowercase letter
`Lm`	modifier letter
`Lo`	other letter
`Lt`	titlecase letter
`Lu`	uppercase letter
`M`	mark
`Mc`	spacing mark
`Me`	enclosing mark
`Mn`	non-spacing mark
`N`	number
`Nd`	decimal number
`Nl`	letter number
`No`	other number
`P`	punctuation
`Pc`	connector punctuation
`Pd`	dash punctuation
`Pe`	close punctuation
`Pf`	final punctuation
`Pi`	initial punctuation
`Po`	other punctuation
`Ps`	open punctuation
`S`	symbol
`Sc`	currency symbol
`Sk`	modifier symbol
`Sm`	math symbol
`So`	other symbol
`Z`	separator
`Zl`	line separator
`Zp`	paragraph separator
`Zs`	space separator

Unicode character class names--scripts¶

`Arabic`	Arabic
`Armenian`	Armenian
`Balinese`	Balinese
`Bamum`	Bamum
`Batak`	Batak
`Bengali`	Bengali
`Bopomofo`	Bopomofo
`Brahmi`	Brahmi
`Braille`	Braille
`Buginese`	Buginese
`Buhid`	Buhid
`Canadian_Aboriginal`	Canadian Aboriginal
`Carian`	Carian
`Chakma`	Chakma
`Cham`	Cham
`Cherokee`	Cherokee
`Common`	characters not specific to one script
`Coptic`	Coptic
`Cuneiform`	Cuneiform
`Cypriot`	Cypriot
`Cyrillic`	Cyrillic
`Deseret`	Deseret
`Devanagari`	Devanagari
`Egyptian_Hieroglyphs`	Egyptian Hieroglyphs
`Ethiopic`	Ethiopic
`Georgian`	Georgian
`Glagolitic`	Glagolitic
`Gothic`	Gothic
`Greek`	Greek
`Gujarati`	Gujarati
`Gurmukhi`	Gurmukhi
`Han`	Han
`Hangul`	Hangul
`Hanunoo`	Hanunoo
`Hebrew`	Hebrew
`Hiragana`	Hiragana
`Imperial_Aramaic`	Imperial Aramaic
`Inherited`	inherit script from previous character
`Inscriptional_Pahlavi`	Inscriptional Pahlavi
`Inscriptional_Parthian`	Inscriptional Parthian
`Javanese`	Javanese
`Kaithi`	Kaithi
`Kannada`	Kannada
`Katakana`	Katakana
`Kayah_Li`	Kayah Li
`Kharoshthi`	Kharoshthi
`Khmer`	Khmer
`Lao`	Lao
`Latin`	Latin
`Lepcha`	Lepcha
`Limbu`	Limbu
`Linear_B`	Linear B
`Lycian`	Lycian
`Lydian`	Lydian
`Malayalam`	Malayalam
`Mandaic`	Mandaic
`Meetei_Mayek`	Meetei Mayek
`Meroitic_Cursive`	Meroitic Cursive
`Meroitic_Hieroglyphs`	Meroitic Hieroglyphs
`Miao`	Miao
`Mongolian`	Mongolian
`Myanmar`	Myanmar
`New_Tai_Lue`	New Tai Lue (aka Simplified Tai Lue)
`Nko`	Nko
`Ogham`	Ogham
`Ol_Chiki`	Ol Chiki
`Old_Italic`	Old Italic
`Old_Persian`	Old Persian
`Old_South_Arabian`	Old South Arabian
`Old_Turkic`	Old Turkic
`Oriya`	Oriya
`Osmanya`	Osmanya
`Phags_Pa`	Phags Pa
`Phoenician`	Phoenician
`Rejang`	Rejang
`Runic`	Runic
`Saurashtra`	Saurashtra
`Sharada`	Sharada
`Shavian`	Shavian
`Sinhala`	Sinhala
`Sora_Sompeng`	Sora Sompeng
`Sundanese`	Sundanese
`Syloti_Nagri`	Syloti Nagri
`Syriac`	Syriac
`Tagalog`	Tagalog
`Tagbanwa`	Tagbanwa
`Tai_Le`	Tai Le
`Tai_Tham`	Tai Tham
`Tai_Viet`	Tai Viet
`Takri`	Takri
`Tamil`	Tamil
`Telugu`	Telugu
`Thaana`	Thaana
`Thai`	Thai
`Tibetan`	Tibetan
`Tifinagh`	Tifinagh
`Ugaritic`	Ugaritic
`Vai`	Vai
`Yi`	Yi

Vim character classes¶

`\i`	identifier character (NOT SUPPORTED) VIM
`\I`	`\i` except digits (NOT SUPPORTED) VIM
`\k`	keyword character (NOT SUPPORTED) VIM
`\K`	`\k` except digits (NOT SUPPORTED) VIM
`\f`	file name character (NOT SUPPORTED) VIM
`\F`	`\f` except digits (NOT SUPPORTED) VIM
`\p`	printable character (NOT SUPPORTED) VIM
`\P`	`\p` except digits (NOT SUPPORTED) VIM
`\s`	whitespace character ( `≡ [ \t]` ) (NOT SUPPORTED) VIM
`\S`	non-white space character ( `≡ [^ \t]` ) (NOT SUPPORTED) VIM
`\d`	digits ( `≡ [0-9]` ) VIM
`\D`	not `\d` VIM
`\x`	hex digits ( `≡ [0-9A-Fa-f]` ) (NOT SUPPORTED) VIM
`\X`	not `\x` (NOT SUPPORTED) VIM
`\o`	octal digits ( `≡ [0-7]` ) (NOT SUPPORTED) VIM
`\O`	not `\o` (NOT SUPPORTED) VIM
`\w`	word character VIM
`\W`	not `\w` VIM
`\h`	head of word character (NOT SUPPORTED) VIM
`\H`	not `\h` (NOT SUPPORTED) VIM
`\a`	alphabetic (NOT SUPPORTED) VIM
`\A`	not `\a` (NOT SUPPORTED) VIM
`\l`	lowercase (NOT SUPPORTED) VIM
`\L`	not lowercase (NOT SUPPORTED) VIM
`\u`	uppercase (NOT SUPPORTED) VIM
`\U`	not uppercase (NOT SUPPORTED) VIM
`\_x`	`\x` plus newline, for any `x` (NOT SUPPORTED) VIM

Vim flags¶

`\c`	ignore case (NOT SUPPORTED) VIM
`\C`	match case (NOT SUPPORTED) VIM
`\m`	magic (NOT SUPPORTED) VIM
`\M`	nomagic (NOT SUPPORTED) VIM
`\v`	verymagic (NOT SUPPORTED) VIM
`\V`	verynomagic (NOT SUPPORTED) VIM
`\Z`	ignore differences in Unicode combining characters (NOT SUPPORTED) VIM

Magic¶

`(?{code})`	arbitrary Perl code (NOT SUPPORTED) PERL
`(??{code})`	postponed arbitrary Perl code (NOT SUPPORTED) PERL
`(?n)`	recursive call to regexp capturing group n (NOT SUPPORTED)
`(?+n)`	recursive call to relative group +n (NOT SUPPORTED)
`(?-n)`	recursive call to relative group -n (NOT SUPPORTED)
`(?C)`	PCRE callout (NOT SUPPORTED) PCRE
`(?R)`	recursive call to entire regexp (≡ `(?0)` ) (NOT SUPPORTED)
`(?&name)`	recursive call to named group (NOT SUPPORTED)
`(?P=name)`	named backreference (NOT SUPPORTED)
`(?P>name)`	recursive call to named group (NOT SUPPORTED)
`(?(cond)true\|false)`	conditional branch (NOT SUPPORTED)
`(?(cond)true)`	conditional branch (NOT SUPPORTED)
`(*ACCEPT)`	make regexps more like Prolog (NOT SUPPORTED)
`(*COMMIT)`	(NOT SUPPORTED)
`(*F)`	(NOT SUPPORTED)
`(*FAIL)`	(NOT SUPPORTED)
`(*MARK)`	(NOT SUPPORTED)
`(*PRUNE)`	(NOT SUPPORTED)
`(*SKIP)`	(NOT SUPPORTED)
`(*THEN)`	(NOT SUPPORTED)
`(*ANY)`	set newline convention (NOT SUPPORTED)
`(*ANYCRLF)`	(NOT SUPPORTED)
`(*CR)`	(NOT SUPPORTED)
`(*CRLF)`	(NOT SUPPORTED)
`(*LF)`	(NOT SUPPORTED)
`(*BSR_ANYCRLF)`	set `\R` convention (NOT SUPPORTED) PCRE
`(*BSR_UNICODE)`	(NOT SUPPORTED) PCRE