正則表示式/Perl 相容正則表示式

Perl 的語法比甚至 POSIX 擴充套件正則表示式語法都更加豐富和可預測。其可預測性的一個例子是，\ 始終對非字母數字字元進行轉義。Perl 可以指定而 POSIX 不可以指定的一個例子是，是否希望匹配的一部分是貪婪的。例如，在模式 /a.*b/ 中，.* 將匹配儘可能多的字元，而在模式 /a.*?b/ 中，.*? 將匹配儘可能少的字元。因此，對於字串 "a bad dab"，第一個模式將匹配整個字串，而第二個模式將只匹配 "a b"。

出於這些原因，許多其他工具和應用程式都採用了看起來非常像 Perl 的語法。例如，Java、Ruby、Python、PHP、exim、BBEdit 甚至微軟的 .NET Framework 都使用與 Perl 中使用的類似的正則表示式語法。並非所有 "Perl 相容" 的正則表示式實現都是完全相同的，而且許多實現只實現了 Perl 功能的一部分。

示例

示例中使用的約定：字元 'm' 並不總是需要指定 Perl 匹配操作。例如，m/[^abc]/ 也可以寫成 /[^abc]/。只有當用戶想要在不使用正斜槓作為正則表示式分隔符的情況下指定匹配操作時，才需要使用 'm'。有時為了避免 "分隔符衝突"，指定一個可選的正則表示式分隔符是很有用的。有關詳細資訊，請參見 'perldoc perlre'。

   metacharacter(s) ;; the metacharacters column specifies the regex syntax being demonstrated
   =~ m//           ;; indicates a regex match operation in perl    
   =~ s///          ;; indicates a regex substitution operation in perl

在下表標題中，"M-c" 代表 "元字元"。

M-c	描述	示例所有 if 語句都返回 TRUE 值。
.	通常匹配除換行符之外的任何字元。在方括號內，點是字面意義上的。	if ("Hello World\n" =~ m/...../) { print "Yep"; # Has length >= 5\n"; }
( )	將一系列模式元素組合成單個元素。當您匹配括號內的模式時，您可以使用 $1、$2 等來引用之前匹配的模式。	if ("Hello World\n" =~ m/(H..).(o..)/) { print "We matched '$1' and '$2'\n"; } 輸出 We matched 'Hel' and 'o W';
+	匹配前面的模式元素一次或多次。	if ("Hello World\n" =~ m/l+/) { print "One or more \"l\"'s in the string\n"; }
?	匹配前面的模式元素零次或一次。	if ("Hello World\n" =~ m/H.?e/) { print "There is an 'H' and a 'e' separated by "; print "0-1 characters (Ex: He Hoe)\n"; }
?	修改前面的 *、+ 或 {M,N} 修飾的正則表示式，使其匹配儘可能少的次數。	if ("Hello World\n" =~ m/(l.+?o)/) { print "Yep"; # The non-greedy match with 'l' followed # by one or more characters is 'llo' rather than 'llo wo'. }
*	匹配前面的模式元素零次或多次。	if ("Hello World\n" =~ m/el*o/) { print "There is an 'e' followed by zero to many "; print "'l' followed by 'o' (eo, elo, ello, elllo)\n"; }
{M,N}	表示最小匹配次數為 M，最大匹配次數為 N。	if ("Hello World\n" =~ m/l{1,2}/) { print "There is a substring with at least 1 "; print "and at most 2 l's in the string\n"; }
[...]	表示一組可能的字元匹配。	if ("Hello World\n" =~ m/[aeiou]+/) { print "Yep"; # Contains one or more vowels }
\|	分隔備選可能性。	if ("Hello World\n" =~ m/(Hello\|Hi\|Pogo)/) { print "At least one of Hello, Hi, or Pogo is "; print "contained in the string.\n"; }
\b	匹配單詞邊界。	if ("Hello World\n" =~ m/llo\b/) { print "There is a word that ends with 'llo'\n"; }
\w	匹配字母數字字元，包括 "_".	if ("Hello World\n" =~ m/\w/) { print "There is at least one alphanumeric "; print "character in the string (A-Z, a-z, 0-9, _)\n"; }
\W	匹配非字母數字字元，不包括 "_".	if ("Hello World\n" =~ m/\W/) { print "The space between Hello and "; print "World is not alphanumeric\n"; }
\s	匹配空白字元 (空格、製表符、換行符、換頁符)	if ("Hello World\n" =~ m/\s.*\s/) { print "There are TWO whitespace characters, which may"; print " be separated by other characters, in the string."; }
\S	匹配除空白字元之外的任何字元。	if ("Hello World\n" =~ m/\S.*\S/) { print "Contains two non-whitespace characters " . "separated by zero or more characters."; }
\d	匹配數字，與 [0-9] 相同。	if ("99 bottles of beer on the wall." =~ m/(\d+)/) { print "$1 is the first number in the string'\n"; }
\D	匹配非數字。	if ("Hello World\n" =~ m/\D/) { print "There is at least one character in the string"; print " that is not a digit.\n"; }
^	匹配行或字串的開頭。	if ("Hello World\n" =~ m/^He/) { print "Starts with the characters 'He'\n"; }
$	匹配行或字串的結尾。	if ("Hello World\n" =~ m/rld$/) { print "Is a line or string "; print "that ends with 'rld'\n"; }
\A	匹配字串的開頭 (但不是內部行)。	if ("Hello\nWorld\n" =~ m/\AH/) { print "Yep"; # The string starts with 'H'. }
\Z	匹配字串的結尾 (但不是內部行)。	if ("Hello\nWorld\n"; =~ m/d\n\Z/) { print "Yep"; # Ends with 'd\\n'\n"; }
[^...]	匹配除方括號內的字元之外的每個字元。	if ("Hello World\n" =~ m/[^abc]/) { print "Yep"; # Contains a character other than a, b, and c. }

工具中的應用

使用 Perl 正則表示式語法的工具和語言包括

另請參閱

Perl 程式設計/正則表示式參考

連結