跳至內容

可計算性和複雜性/形式語言/喬姆斯基層級/上下文敏感語言

來自華夏公益教科書,開放的書籍,開放的世界

上下文敏感語言

[編輯 | 編輯原始碼]

上下文敏感語言是指其字串可以透過上下文敏感文法生成的語言。這種文法形式類似於上下文無關文法(見上下文無關語言),但它不使用形式為 A -> X 的規則,其中 A 是一個非終結符,X 是一個終結符和非終結符的字串,而上下文敏感文法包含形式為 X -> Y 的規則,其中 X 和 Y 是終結符和非終結符的字串,並受制於 Y 的長度至少要與 X 的長度相同。有一個例外:如果 S 是起始符號,並且 S 從未出現在任何轉換的右側,則允許 S -> ε 的轉換。這使得文法可以生成空字串。

這種文法的一個例子是

  • S -> ε|aAbc|abc
  • A -> aAbC|abC
  • Cb -> bC
  • Cc -> cc

這種文法生成語言,這是一種上下文無關的語言。

上下文敏感文法的另一種定義是,其中每個轉換都具有 YAZ -> YXZ 的形式,其中 A 是一個非終結符,而 X、Y 和 Z 都是終結符和非終結符的字串(Y 和 Z 可以為空)。由於 Y 和 Z 保持不變,這與 A -> X 轉換的作用相同,但可以使用 A 的上下文作為轉換的要求。在1中,諾姆·喬姆斯基證明了這兩個定義是等價的,並且生成相同的語言。

根據僅使用空上下文的第二種定義,任何上下文無關文法也是上下文敏感文法,因此上下文無關語言集是上下文敏感語言集的子集。由於在上面的示例中,上下文敏感文法可以生成語言,而 PDA 無法生成,因此上下文無關語言必須是上下文敏感語言的真子集。

線性有界自動機

[編輯 | 編輯原始碼]

上下文敏感語言類等價於可以被線性有界自動機識別的語言類。線性有界自動機(LBA)是一種基於狀態的確定性機器,它有一個包含輸入字串的“磁帶”以及一個沿磁帶左右移動的讀寫頭。機器根據其當前狀態和頭位置磁帶上的符號,以及有限數量的規則來確定其下一個狀態,要寫入磁帶上的符號以及要移動磁帶頭的方向(左/右)。與有限自動機和下推自動機類似,線性有界自動機根據它們是否停止在接受狀態來接受或拒絕輸入。LBA 在沒有與其當前狀態和讀取字元的組合匹配的規則時停止。

對 LBA 的唯一其他限制是,磁帶必須是有限的,其大小是根據輸入字串的大小線性推匯出來的。例如,如果一臺特定機器使用長度為 2*s+5 的磁帶,其中 s 是輸入的大小,那麼在處理大小為 10 的字串時,它的磁帶長度將為 25。

這種設計賦予 LBA 很大的計算自由度,因此它可以識別任何字串識別空間需求線性增長的語言。一個例子是語言。這種語言可以在 s+1 空間內識別,這意味著它只需要它所寫入的磁帶部分,以及末尾的一個空字元。這是因為與 PDA 或 DFA 不同,LBA 無需按順序處理輸入 - 它可以在(並且在下面的樣本機器中確實會)按組標記字元,首先標記一個“a”,然後標記一個“b”,然後標記一個“c”,然後標記一個“d”,並重復。如果它沒有在同一輪次中耗盡所有字元型別,它將不會接受。

這種語言無法被 DFA 和 PDA 識別,因為它們必須在經過時計算並存儲“a”的數量。DFA 只能記住等於或少於其狀態數的“a”的數量,因此無法檢查任意長字串中“b”的數量。PDA 可以將“a”的數量儲存在堆疊中,但它在將它們與“b”進行比較時必然會破壞這些資訊,因此無法將它們與“c”的數量進行比較。

請注意,儘管 LBA 可能進入無限迴圈,但由於只有一個有限磁帶,有限數量的符號可以放在上面,以及有限數量的狀態它可以處於,因此給定機器和輸入的條件總數是有限的。這意味著任何無限迴圈都必須在有限步數內返回到它已經遇到的條件集。此外,由於 LBA 是確定性的,因此重複其條件集的機器將無限期地繼續重複它剛剛遍歷的迴圈。這兩個陳述表明,LBA 僅當且僅當它重複其全部條件(包括狀態、頭位置和磁帶內容)時,才會陷入無限迴圈(因此永遠不會接受或拒絕)。由於是所有可能條件的總數,其中 S 是狀態數,T 是磁帶的大小,A 是字母表的大小,任何執行超過 S*T*A^T 步的機器都必須重複一組條件,因此處於無限迴圈中。雖然這個數字可能非常大,但它提供了一種明確的、有限時間的方法來確定給定的 LBA 是否會在給定輸入上迴圈。類似但更高階的圖靈機(見無限制語言)將沒有這種方法。

下面的程式碼是 Perl 中的樣本 LBA 模擬器。給定機器的描述和一個輸入字串,它模擬機器處理輸入字串,並顯示機器是否接受。

語法為:progname.pl LBAFile inputFile,其中 LBAFile 是包含 LBA 指令的文字檔案,inputFile 是包含輸入字串的文字檔案。一些示例輸入,包括一組用於機器識別 的 LBA 指令集,可在 LBA 示例輸入 中找到。

#!usr/bin/perl
use Text::ParseWords;
use strict;
use warnings;

my (@tape, $tapeIndex, $tapeMax, %accepts, %alphabet, @rules);

# Grabs the filenames for the machine and the word to be run on it.
my $lbaFile = $ARGV[0];
my $input = $ARGV[1];

# We use subroutines to parse and verify the data in the input files.
# The machine data is stored in the $machine structure as the keys rules, accepts, alphabet, and startState.
my $machine = readLBA($lbaFile);
# Rules and accepts are extracted from the $machine structure for ease of access.
@rules = @{$machine->{rules}};
%accepts = %{$machine->{accepts}};
# This reads the input file and parses it into an array of strings, with each element being one input symbol.
# It checks to make sure the elements are all in the machine's alphabet.
($tapeMax, @tape) = readInput($input, $machine->{alphabet}, $machine->{tapeBound});
# $changed records whether or not a rule has been used when running through the rules list to make transitions.
my $changed = 1;
# $state is the state the Turing Machine is in, and is initialized to the start state from the machine file.
my $state = $machine->{startState};
# $tapeIndex is the position of the machine's head on the tape.
$tapeIndex = 0;
# Now that the machine is initialized, we can begin making transitions

# As long as things keep changing, keep cycling through the rules.
while($changed)
{
   # Unless it changes while going through the rules, the machine will terminate.
    $changed = 0;
   # The current tape is printed, with the symbol under the head highlighted.
    print "@tape[0..$tapeIndex-1]<".$tape[$tapeIndex].">@tape[$tapeIndex+1..@tape-1]\n";
   # The current state of the machine is printed.
    print "$state\n";
   # A new state is calculated by checking conditions against the list of rules
    for my $rNum (0..@rules-1)
    {
#            print "::$rules[$rNum][0]??$branches[$i][0]";
#            print "::$rules[$rNum][1]??$string[$branches[$i][1]]";
#            print "::$rules[$rNum][2]??".${$branches[$i][2]}[@{$branches[$i][2]}-1]."::\n";
       # Checks the current state and tape symbol against the rule being examined
        if (($rules[$rNum][0] eq $state) and
            ($rules[$rNum][1] eq $tape[$tapeIndex]))
        {
           # The state transition is printed.
#            print "State: ".$state." -> ".$rules[$rNum][2]."\n\n";
           # Set the new state,
            $state = $rules[$rNum][2];
           # Write the new symbol to the tape,
            $tape[$tapeIndex] = $rules[$rNum][3];
           # Shift the tape to the new index,
            $tapeIndex += $rules[$rNum][4];
           # and make sure it hasn't run past the left edge of the tape.
            if ($tapeIndex < 0) { $tapeIndex = 0; }
           # If the machine nears the end of the allocated tape, and more tape is allowed, expand the tape.
            if (($tapeIndex >= @tape-2) and (@tape < $tapeMax))
            {
                push(@tape, "_");
            }
           # If the head runs past the right end of the tape, keep it on the right end.
            if ($tapeIndex >= $tapeMax-1) { $tapeIndex = $tapeMax-1; }
            $changed = 1;
           # Once we've made a transition, we can stop and begin looking for the next one.
            last;
        }
    }
}
# When there are no more possible transitions, if the machine is in an accepting state,
if (defined($accepts{$state}))
{
   # Print that it accepts and quit.
    print "The machine accepts the string.\n";
    exit;
}
# Otherwise, print that it does not accept, and quit.
print "The machine does not accept the string.\n";
exit;

###################################################


sub readLBA
# This subroutine reads the machine data from the specified file into variables (mostly hashes).
{
    my (%states, %accepts, %alphabet, @rules, @tapeBound);
    open(INFILE, shift) or die "Can't open machine file: $!";
    
# This block reads the list of states from the machine file.
   # Discards the section header,
    <INFILE>;
    my $line = <INFILE>;
    chomp($line);
    my @words = &parse_line('\s+', 0, $line);
    for (@words)
    {
       # records the state names for checking the rules,
        $states{$_} = 0;
    }

# This block reads the start state from the machine file.
   # Discards the header,
    <INFILE>;
    my $startState = <INFILE>;
   # takes the whole line as the start state,
    chomp($startState);
   # and makes sure that the start state is defined in the list of states.
    defined($states{$startState}) or die "The start state $startState isn't a state!";
    
# This block reads the list of accepting states from the machine file.
   # Discards the header,
    <INFILE>;
    $line = <INFILE>;
    chomp($line);
   # breaks up the line into state names,
    @words = &parse_line('\s+', 0, $line);
    for (@words)
    {
   # checks to make sure that the accept states are defined states,
        defined($states{$_}) or die "$_ isn't a state!";
   # and defines those names in a new hash.  The use of a hash makes it easier to determine later if a specific state name accepts or not.
        $accepts{$_} = 1;
    }
    
# This block reads the list of symbols in the alphabet from the machine file.
   # Discards the header,
    <INFILE>;
    $line = <INFILE>;
    chomp($line);
   # breaks up the line into alphabet symbols (note that the symbols can be of arbitrary length),
    @words = &parse_line('\s+', 0, $line);
   # e is used as the empty symbol in the rules.
    $alphabet{e} = 1;
    for (@words)
    {
       # This records which symbols are in the alphabet for checking the rules.
        $alphabet{$_} = 0;
    }

# This block reads the linear bound on the tape size from the machine file.
   # Discards the header,
    <INFILE>;
    $line = <INFILE>;
    chomp($line);
   # breaks up the line into two parts - the "slope" and the "intercept".
    @words = &parse_line('\s+', 0, $line);
   # and stores them in the @tapeBound array
    $tapeBound[0] = $words[0];
    $tapeBound[1] = $words[1];

# This block reads the state transition rules from the machine file.
   # Discards the header,
    <INFILE>;
   # This variable synchronizes the position of each rule in the rules array.
    my $rulesCounter=0;
    while(<INFILE>)
    {
   # breaks each rule into start state, input symbol, stack symbol, end state, and new stack symbol.
        chomp($_);
        @words = &parse_line('\s+', 0, $_);
   # checks that the first four pieces are defined in the state and alphabet hashes,
        defined($states{$words[0]}) or die "$words[0] isn't a defined state!";
        defined($alphabet{$words[1]}) or die "$words[1] isn't defined in the alphabet!";
        defined($states{$words[2]}) or die "$words[2] isn't a defined state!";
        defined($alphabet{$words[3]}) or die "$words[3] isn't defined in the alphabet!";
   # and converts the left/right instruction into a number to be added to the position counter.
        if (($words[4] eq "left") or ($words[4] eq "-1"))
        {
            $words[4]=-1;
        }
        elsif (($words[4] eq "right") or ($words[4] eq "+1"))
        {
            $words[4]=1;
        }
        else
        {
            (die "$words[4] isn't left, right, -1, or +1!");
        }
   # then creates an array of each rule.
        for (0..4)
        {
            $rules[$rulesCounter][$_] = $words[$_];
        }
       # The synchronization variable has to be updated.
        $rulesCounter++;
    }

   # Reading complete, the subroutine closes the file and returns the name of the start state.
    close INFILE;
   # The relevant data is stored in the $machine structure and returned to the main routine.
    my $machine = 
    {
        rules      => \@rules,
        accepts    => \%accepts,
        alphabet   => \%alphabet,
        startState => $startState,
        tapeBound  => \@tapeBound
    };
    return $machine;
}


sub readInput
# This subroutine reads the starting tape from the specified file into an array of symbols.
{
    my (@tape, %alphabet, $alphaRef, @tapeBound, $boundRef);
    open(INFILE, shift) or die "Can't open ".$input.": $!";
    $alphaRef = shift;
    %alphabet = %{$alphaRef};
    $boundRef = shift;
    @tapeBound = @{$boundRef};
   # The first line of the file is read as the initial state of the tape, with symbols delimited by spaces.
    my $line = <INFILE>."";
    chomp($line);
    @tape = &parse_line('\s+', 0, $line);
   # This makes sure every symbol in the input string was defined in the machine's alphabet.
    for (@tape)
    { defined($alphabet{$_}) or die "$_ in $input isn't in this machine's alphabet!"; }
    close INFILE;
    my $tapeMax = @tape*$tapeBound[0]+$tapeBound[1];
    return ($tapeMax, @tape);
}

上一頁 | 下一頁

華夏公益教科書