跳轉到內容

可計算性與複雜性/形式語言/喬姆斯基層次結構/上下文無關語言

來自華夏公益教科書,開放的書籍,開放的世界

上下文無關語言

[編輯 | 編輯原始碼]

上下文無關語言是喬姆斯基層次結構中第二類最受限制的語言。該類語言可以透過使用“非終結符”和“終結符”的一組生成規則來描述,其中終結符是語言的字母表。這些規則將非終結符替換為終結符或非終結符的字串,或替換為空字串。慣例是用大寫字母表示非終結符,用小寫字母表示字母表中的符號(以及終結符)。這組替換規則稱為上下文無關文法。如果存在一個上下文無關文法,它從起始非終結符生成語言中的每個字串(而不生成其他字串),那麼語言就是上下文無關的。

例如,考慮以下文法

  • S -> ε(ε 代表空字串)
  • S -> A
  • A -> aAb
  • A -> ab

從起始符號 S 開始,該文法可以生成空字串或一個單詞,該單詞由任意數量的 a 後跟相同數量的 b 組成。您可能還記得,關於 正則語言 的部分,該語言也可以寫成 ,它不屬於正則語言類別,但我們現在看到它是上下文無關的。

請注意,在示例中,文法提供了在 S 和 A 非終結符上應用什麼規則的選擇。正是這些選擇使文法能夠生成語言中的所有字串,而不僅僅是一個字串。

下推自動機

[編輯 | 編輯原始碼]

上下文無關語言類別與稱為下推自動機的機器所識別的語言類別相同。下推自動機 (PDA) 是一種非確定性機器,它由有限個狀態組成,這些狀態之間存在著轉換,就像 NFA(見 正則語言)一樣,但它還增加了一個大小無限的棧。作為轉換的一部分,機器可以彈出棧頂元素,並將其內容用作轉換的一部分,也可以將新元素壓入棧中。它的轉換可以寫成 {A,x,y} -> {B,z},其中 A 和 B 是起始狀態和結束狀態,x 是下一個輸入字元,y 是從棧中彈出的內容,z 是壓入棧的內容。x、y 或 z 中的任何一個都可以是 ε,這意味著在該轉換中沒有放置或消耗任何內容。

棧的新增為機器提供了有限但任意大的記憶體量,使它能夠識別比有限自動機更復雜的語言。如上所述,語言 是一種上下文無關語言,因此存在一個識別它的 PDA。它可以透過為字串中的每個 a 新增一個專案到棧中來實現,從而儲存它們的計數,併為每個 b 從棧中刪除一個專案。如果 ab 的數量相同,則當沒有更多 b 時,棧將清空,因此可以有效地比較這兩個數字。有關識別此語言的機器的更詳細示例,請參閱下面的示例機器。

如果 PDA 根本不使用棧,它就等效於 NFA,因此等效於 DFA,因此有限自動機識別的任何語言都可以由 PDA 識別。因此,由於存在非正則的上下文無關語言,正則語言類別是上下文無關語言類別的真子集。

侷限性

[編輯 | 編輯原始碼]

與正則語言一樣,還有許多語言不是上下文無關的。PDA 上的棧雖然提供了無限的儲存容量,但仍然是一個棧,因此在任何給定時間只能訪問最後一個放置在它上面的元素。訪問更早的元素需要刪除並因此丟失後面的元素,因為沒有其他棧可以放置它們。PDA 還有另一個限制,它必須按接收到的順序消耗輸入字元,並且不能再次訪問它們,除非將它們放到棧中。

這些限制使得 PDA 無法識別語言 。透過使用棧,PDA 可以計算 a 的數量,並將它們與 b 的數量進行比較,但在這樣做時,它必須消耗對 a 的記錄,因此無法以相同的方式比較 c。機器無法在不遮蔽 a 的計數的情況下建立 b 的記錄來與 c 進行比較,因此該方法同樣不成功。簡而言之,PDA 記憶體缺乏任何型別的隨機或直接訪問阻止了它識別此語言和許多其他語言,由於沒有 PDA 能夠識別它們,因此它們不可能是上下文無關語言。

以下程式碼是 Perl 中的 PDA 模擬器示例。給定機器的描述和輸入字串,它模擬機器處理輸入字串,並顯示機器是否接受。

語法為:progname.pl PDAFile inputFile,其中 PDAFile 是包含 TM 指令的文字檔案,inputFile 是包含輸入字串的文字檔案。一些示例輸入,包括一組用於識別 的機器的 PDA 指令集,可以在 示例 PDA 輸入 中找到。

#!usr/bin/perl
use Text::ParseWords;
use strict;
use warnings;

my (@branches, @stack, %alphabet);

# Grabs the filenames for the machine and the word to be run on it.
my $pdaFile = $ARGV[0];
my $input = $ARGV[1];

# We use subroutines to parse and verify the data in the input files.
# The machine data is stored in the $machine structure as the keys rules, accepts, alphabet, and startState.
my $machine = readPDA($pdaFile);
# Rules and accepts are extracted from the $machine structure for ease of access.
my @rules = @{$machine->{rules}};
my %accepts = %{$machine->{accepts}};
# This reads the input file and parses it into an array of strings, with each element being one input symbol.
# It checks to make sure the elements are all in the machine's alphabet.
my @string = readInput($input, $machine->{alphabet});
# The newstate is a temporary storage point for when a new state is calculated.
my $newstate;
# The usedRule is a temporary storage point for when a new state is calculated.
my $usedRule;
# The changed variable represents whether or the current branch is unfinished.
my $changed = 1;
push(@stack, "");
# The top level of the branches array corresponds to each branch of possibilities created by the non-determinism of the PDA.
# Each element contains the conditions of the machine for that branched possibility.
# The first element of each collection is the state of the branch.
$branches[0][0] = $machine->{startState};
# The second element is how much of the input string the branch has read.
$branches[0][1] = 0;
# The third element is an array containing the stack for that branch.
$branches[0][2][0] = "";
# Now that the first branch is initialized, the processing can begin

for my $i (0..$#branches)
{
   # When we start a branch, print the branch number
    print "\nBeginning branch ".$i.".\n";
   # As long as things keep changing, keep cycling through the rules.
    while($changed)
    {
       # Unless it changes while going through the rules, this branch will quit.
        $changed = 0;
       # The input word is printed, with the next symbol highlighted.
        print "Input: @string[0..$branches[$i][1]-1] <".$string[$branches[$i][1]]."> @string[$branches[$i][1]+1..@string-1]\n";
       # The current state of the stack is printed.
        print "Stack: @{$branches[$i][2]}\n";
       # A new state is calculated by checking conditions against the list of rules
        for my $rNum (0..$#rules)
        {
#            print "::$rules[$rNum][0]??$branches[$i][0]";
#            print "::$rules[$rNum][1]??$string[$branches[$i][1]]";
#            print "::$rules[$rNum][2]??".$branches[$i][2]->[-1]."::\n";
           # Checks the current state, input, and top stack item against the rule
            if ($rules[$rNum][0] eq $branches[$i][0] &&
                ($rules[$rNum][1] eq "e" || $rules[$rNum][1] eq $string[$branches[$i][1]]) &&
                ($rules[$rNum][2] eq "e" || $rules[$rNum][2] eq $branches[$i][2]->[-1]))
            {
                if ($changed == 0)
                {
                   # Set the new state.
                    $newstate = $rules[$rNum][3];
                   # The state transition is printed.
                    print "State: ".$branches[$i][0]." -> ".$newstate."\n\n";
                    $changed = 1;
                   # Because possible branched depend on this state, we can't update it yet.
                   # When we can update this state, $usedRule will help us remember which rule to base those updates on.
                    $usedRule = $rNum;
                }
                else
                {
                   # Set the new state.
                    my $branchState = $rules[$rNum][3];
                   # The state transition is printed.
                    print "(branching) State: ".$branches[$i][0]." -> ".$branchState."\n\n";
                    my $newBranch = @branches;
                   # The state in the new branch is set.
                    $branches[$newBranch][0] = $branchState;
                   # The new branch starts with the same string position as the old branch,
                    $branches[$newBranch][1] = $branches[$i][1];
                   # and the same stack, so the stack has to be replicated.
                    @{$branches[$newBranch][2]} = @{$branches[$i][2]};
                   # If we read a symbol from the input to make the transition,
                    unless ($rules[$rNum][1] eq "e")
                    {
                       # then we should move to the next symbol.
                        $branches[$newBranch][1]++;
                    }
                   # If we used an element from the stack to make the transition,
                    unless ($rules[$rNum][2] eq "e")
                    {
                       # then it's used up and should be removed.
                        pop(@{$branches[$newBranch][2]});
                    }
                   # If the rule adds something to the stack,
                    unless ($rules[$rNum][4] eq "e")
                    {
                       # then it gets added.
                        push(@{$branches[$newBranch][2]}, $rules[$rNum][4]);
                    }
                }
            }
        }
       # Now that any branching has been finished, we can update the original branch.
        if ($changed)
        {
           # If we read a symbol from the input to make the transition,
            unless ($rules[$usedRule][1] eq "e")
            {
               # then we should move to the next symbol.
                $branches[$i][1]++;
            }
           # If we used an element from the stack to make the transition,
            unless ($rules[$usedRule][2] eq "e")
            {
               # then it's used up and should be removed.
                pop(@{$branches[$i][2]});
            }
           # If the rule adds something to the stack,
            unless ($rules[$usedRule][4] eq "e")
            {
               # then it gets added.
                push(@{$branches[$i][2]}, $rules[$usedRule][4]);
            }
           # The state changes to the new state.
            $branches[$i][0] = $newstate;
        }
    }
   # When the input is exhausted, the branch is in its final state.
    print "Final state of branch ".$i." is ".$branches[$i][0]."\n";
   # If that state is in the accept states list, the machine accepts the input and halts.
    if (defined($accepts{$branches[$i][0]}) && $branches[$i][1] == $#string)
    {
        print "The machine accepts the string.\n";
        exit;
    }
   # If that state doesn't, point it out.
    else { print "The branch does not accept the string.\n"; }
   # And move on.
    $changed = 1;
}
print "The machine does not accept the string.\n";

###################################################

sub readPDA
# This subroutine reads the machine data from the specified file into variables (mostly hashes).
{
    my (%states, %stackAlphabet, %accepts, %alphabet, @rules);
    open(INFILE, shift) or die "Can't open machine file: $!";
    
# This block reads the list of states from the machine file.
   # Discards the section header,
    <INFILE>;
    my $line = <INFILE>;
    chomp($line);
    my @words = &parse_line('\s+', 0, $line);
    for (@words)
    {
       # records the state names for checking the rules,
        $states{$_} = 0;
    }

# This block reads the start state from the machine file.
   # Discards the header,
    <INFILE>;
    my $startState = <INFILE>;
   # takes the whole line as the start state,
    chomp($startState);
   # and makes sure that the start state is defined in the list of states.
    defined($states{$startState}) or die "The start state $startState isn't a state!";
    
# This block reads the list of accepting states from the machine file.
   # Discards the header,
    <INFILE>;
    $line = <INFILE>;
    chomp($line);
   # breaks up the line into state names,
    @words = parse_line('\s+', 0, $line);
    for (@words)
    {
   # checks to make sure that the accept states are defined states,
        defined($states{$_}) or die "$_ isn't a state!";
   # and defines those names in a new hash.  The use of a hash makes it easier to determine later if a specific state name accepts or not.
        $accepts{$_} = 1;
    }
    
# This block reads the list of symbols in the alphabet from the machine file.
   # Discards the header,
    <INFILE>;
    $line = <INFILE>;
    chomp($line);
   # breaks up the line into alphabet symbols (note that the symbols can be of arbitrary length),
    @words = parse_line('\s+', 0, $line);
   # e is used as the empty symbol in the rules.
    $alphabet{e} = 1;
    for (@words)
    {
       # This records which symbols are in the alphabet for checking the rules.
        $alphabet{$_} = 0;
    }
    
# This block reads the list of symbols in the stack alphabet from the machine file.
   # Discards the header,
    <INFILE>;
    $line = <INFILE>;
    chomp($line);
   # breaks up the line into alphabet symbols (note that the symbols can be of arbitrary length),
    @words = parse_line('\s+', 0, $line);
   # e is used as the empty symbol in the rules.
    $stackAlphabet{e} = 1;
    for (@words)
    {
       # This records which symbols are in the alphabet for checking the rules.
        $stackAlphabet{$_} = 0;
    }
    
# This block reads the state transition rules from the machine file.
   # Discards the header,
    <INFILE>;
   # This variable synchronizes the position of each rule in the rules array.
    my $rulesCounter=0;
    while(<INFILE>)
    {
   # breaks each rule into start state, input symbol, stack symbol, end state, and new stack symbol.
        chomp;
        @words = parse_line('\s+', 0, $_);
   # checks that all five pieces are defined in the state and alphabet hashes,
        defined($states{$words[0]}) or die "$words[0] isn't a defined state!";
        defined($alphabet{$words[1]}) or die "$words[1] isn't defined in the alphabet!";
        defined($stackAlphabet{$words[2]}) or die "$words[2] isn't defined in the stack alphabet!";
        defined($states{$words[3]}) or die "$words[3] isn't a defined state!";
        defined($stackAlphabet{$words[4]}) or die "$words[4] isn't defined in the stack alphabet!";
   # then creates an array of each rule.
        for (0..4)
        {
            $rules[$rulesCounter][$_] = $words[$_];
        }
       # The synchronization variable has to be updated.
        $rulesCounter++;
    }

   # Reading complete, the subroutine closes the file and returns the name of the start state.
    close INFILE;
   # The relevant data is stored in the $machine structure and returned to the main routine.
    my $machine = 
    {
        rules      => \@rules,
        accepts    => \%accepts,
        alphabet   => \%alphabet,
        startState => $startState
    };
    return $machine;
}

sub readInput
# This subroutine reads the input string from the specified file into an array of symbols.
{
    open(INFILE, shift) or die "Can't open ".$input.": $!";
    my $alphaRef = shift;
   # The first line of the file is read as the input, with symbols delimited by spaces.
    my $line = <INFILE>."";
    chomp($line);
    my @string = parse_line('\s+', 0, $line);
   # This makes sure every symbol in the input string was defined in the machine's alphabet.
    for (@string)
    { exists($alphaRef->{$_}) or die "$_ in $input isn't in this machine's alphabet!"; }
    close INFILE;
   # Since the machine can continue to make transitions after the string is exhausted, this adds a blank element to keep the string from overrunning.
    push(@string, "");
    return @string;
}

上一步 | 下一步

華夏公益教科書