可計算性和複雜性/形式語言/喬姆斯基層次結構/正則語言
正則語言是喬姆斯基層次結構中最受限制且最簡單的語言。此類語言通常像數學集合一樣描述,用大括號描述。任何匹配該描述的單詞都是該語言的一部分,而任何不匹配該描述的單詞都不是該語言的一部分。
非正式地,正則語言類是指可以用其字母表的字元以及連線、括號、∪(或)運算子和*運算子描述的語言。例如,,包含任意數量的a和b的語言,其中每個a後面都跟著一個b。
需要注意的是,任何有限語言,即包含一些有限的單詞序列的語言,可以寫成。因此,每種有限語言都是正則的。然而,反過來卻不是真的。前面給出的示例語言,,包含無限個單詞,並且是正則的。
正則語言的另一種等效定義是那些可以透過正則文法生成的語言。正則文法是一組規則,從初始符號開始,透過替換規則將“非終結符”(不在語言字母表中的符號)替換為“終結符”(在語言字母表中的符號),這些規則將每個非終結符替換為一個終結符,一個終結符後跟一個非終結符,或ε(空字串)。例如,語言可以透過以下文法描述(令 S 為起始符號)
- S -> ε
- S -> aB
- S -> b
- S -> bA
- A -> aB
- A -> b
- A -> bA
- B -> bA
- B -> b
注意,透過對要應用的規則進行不同的選擇,該文法可以從“S”的起始字串生成空字串或包含a和b的字串,其中每個a後面都跟著一個b。這與匹配上面集合符號的單詞完全相同,使得兩種方法生成的語言相同。
有限自動機是機器,包含有限個狀態,並且僅根據當前狀態和字元輸入在這些狀態之間轉換。當輸入都被消耗後,機器會根據機器的最終狀態“接受”或“拒絕”輸入字串。這些機器恰好對應於正則語言,任何有限自動機接受的字串集都是正則語言,並且每個正則語言都有一個機器接受該語言中所有且僅有的字串。因此,我們說正則語言類等效於有限自動機識別的語言類。
有限自動機有兩種型別,確定性和非確定性。確定性有限自動機 (DFA) 針對輸入中的每個字元執行一次轉換,並且從每個狀態到字母表中的每個字元都包含一個轉換。
非確定性有限自動機 (NFA) 每次轉換可以消耗一個字元或不消耗字元,不需要從每個狀態對每個字元執行轉換,並且對於特定輸入和特定狀態,可以有多個可能的轉換。這些屬性允許 NFA 包含計算分支。如果由字串生成的任何計算分支接受,則機器接受該字串,並且僅當沒有分支接受時才會拒絕。由於 DFA 比 NFA 更嚴格,因此每個 DFA 也是 NFA,並且任何 NFA 都可以轉換為 DFA,因此所有 NFA 和 DFA 識別的語言集是相同的,並且這兩個機器是等效的。
對於兩種型別,自動機都可以透過其狀態集、字母表、轉換集、起始狀態和接受狀態來完全指定。
並非所有語言都是正則的。以語言 為例,其單詞包含一定數量的 a 接著相同數量的 b。有限自動機唯一的記憶就是其當前狀態,因此它只能計數不超過其狀態數的 a 數量。由於該語言包含所有此類單詞,因此單詞可以擁有任意數量的 a,所以對於任何機器而言,都必須存在一些字串,其 a 的數量無法儲存。如果無法儲存 a 的數量,則無法比較 a 和 b 的數量,因此無法驗證任何字串是否屬於該語言。因此,不存在識別該語言的有限自動機,因此它不能是正則語言。
以下程式碼是 Perl 中的示例 DFA 模擬器。給定機器的描述和輸入字串,它模擬機器處理輸入字串,並顯示機器是否接受。
語法為:progname.pl DFAFile inputFile,其中 DFAFile 是包含 DFA 指令的文字檔案,inputFile 是包含輸入字串的文字檔案。一些示例輸入(包括一組 DFA 指令)可以在 示例 DFA 輸入 中找到。
#!usr/bin/perl
use Text::ParseWords;
use strict;
use warnings;
our (%states, %accepts, %alphabet, @rules, @string);
# Grabs the filenames for the machine and the word to be run on it.
my $dfaFile = $ARGV[0];
my $input = $ARGV[1];
# Uses subroutines to parse and verify the data in the input files.
# The data is stored in the global hashes and arrays, with the exception of the start state, which is provided here.
my $state = readDFA($dfaFile);
readInput($input);
# The newstate is a temporary storage point for when a new state is calculated.
my $newstate;
# For each symbol in the input word, the next state is calculated.
for (0..@string-1)
{
# The input word is printed, with the next symbol highlighted.
print "@string[0..$_-1] <".$string[$_]."> @string[$_+1..@string-1]\n";
# A new state is calculated.
$newstate = $rules[$states{$state}][$alphabet{$string[$_]}];
# The state transition is printed.
print "State: ".$state." -> ".$newstate."\n\n";
# The state changes to the new state.
$state = $newstate;
}
# When the input is exhausted, the machine is in its final state.
print "Final state is ".$state."\n";
# If that state is in the accept states list, the machine accepts the input.
if (defined($accepts{$state})) { print "The machine accepts the string.\n"; }
# If not, the machine rejects.
else { print "The machine does not accept the string.\n" }
###################################################
sub readDFA
# This subroutine reads the machine data from the specified file into variables (mostly hashes).
{
open(INFILE, shift) or die "Can't open ".$dfaFile.": $!";
# This block reads the list of states from the machine file.
# Discards the section header,
<INFILE>;
my $line = <INFILE>;
chomp($line);
my @words = &parse_line('\s+', 0, $line);
my $counter = 0;
for (@words)
{
# assigns each state name a number by creating a hash,
$states{$_} = $counter;
$counter++;
}
# and records the number of states.
my $stateCount = $counter;
# This block reads the start state from the machine file.
# Discards the header,
<INFILE>;
my $startState = <INFILE>;
# takes the whole line as the start state,
chomp($startState);
# and makes sure that the start state is defined in the list of states.
defined($states{$startState}) or die "The start state $startState isn't a state!";
# This block reads the list of accepting states from the machine file.
# Discards the header,
<INFILE>;
$line = <INFILE>;
chomp($line);
# breaks up the line into state names,
@words = &parse_line('\s+', 0, $line);
for (@words)
{
# checks to make sure that the accept states are defined states,
defined($states{$_}) or die "The start state $_ isn't a state!";
# and defines those names in a new hash. The use of a hash makes it easier to determine later if a specific state name accepts or not.
$accepts{$_} = 1;
}
# This block reads the list of symbols in the alphabet from the machine file.
# Discards the header,
<INFILE>;
$line = <INFILE>;
chomp($line);
# breaks up the line into alphabet symbols (note that the symbols can be of arbitrary length),
@words = &parse_line('\s+', 0, $line);
$counter = 0;
for (@words)
{
# assigns each symbol a number, and creates a hash to reference them,
$alphabet{$_} = $counter;
$counter++;
}
# and then records the number of symbols in the alphabet.
my $alphabetCount = $counter;
# This block reads the state transition rules from the machine file.
# Discards the header,
<INFILE>;
while(<INFILE>)
{
# breaks each rule into start state, input symbol, and end state,
@words = &parse_line('\s+', 0, $_);
# checks that all three pieces are defined in the state and alphabet hashes,
defined($alphabet{$words[1]}) or die "$words[1] isn't defined in the alphabet!";
defined($states{$words[0]}) or die "$words[0] isn't a defined state!";
defined($states{$words[2]}) or die "$words[2] isn't a defined state!";
# then uses the numbers assigned to the start state and the input symbol as indexes in a 2-dimensional array whose value is the name of the end state.
$rules[$states{$words[0]}][$alphabet{$words[1]}] = $words[2];
}
# This last block makes sure the set of rules is comprehensive.
for my $i (0..$stateCount-1)
{
defined($rules[$i]) or die "Rules are incomplete.";
for my $j (0..$alphabetCount-1)
{
defined($rules[$i][$j]) or die "Rules are incomplete.";
}
}
# Reading complete, the subroutine closes the file and returns the name of the start state.
close INFILE;
return $startState;
}
sub readInput
# This subroutine reads the input string from the specified file into an array of symbols.
{
open(INFILE, shift) or die "Can't open ".$input.": $!";
# The first line of the file is read as the input, with symbols delimited by spaces.
my $line = <INFILE>."";
chomp($line);
@string = &parse_line('\s+', 0, $line);
# This makes sure every symbol in the input string was defined in the machine's alphabet.
for (@string)
{ defined($alphabet{$_}) or die "$_ in $input isn't in this machine's alphabet!"; }
close INFILE;
}