跳至內容

系統發育學/比對準備/全域性比對

來自華夏公益教科書,開放世界的開放書籍

全域性比對方法旨在比對兩個序列的整個長度。如果沒有序列演化的模型,這些方法是非常通用的,並且可以應用於任何序列(不僅僅是生物學的)相似性。

要討論的第一個演算法是 Needleman-Wunsch。

requires: seq1 = first sequence; seq2 = second sequence
//scoring matrices values
match = 1
mismatch = -1
gap = -1
sc_matrix //initialize scoring matrix the size of which is the largest sequence length
p_matrix //initialize pointing matrix the size of which is the largest sequence length
sc_matrix[0][0] = 0
p_matrix[0][0] = 0 // 0 = no where; 1 = left; 2 = right; 3 = down; 4 = up; diagonal = 5;
for i=0 to length of seq1
  sc_matrix[0][i] = gap * i
  p_matrix[0][i] = 1
end for
for i=0 to length of seq2
  sc_matrix[i][0] = gap * i
  p_matrix[i][0] = 1
end for
for i=0 to length of seq2
  for j=0 to length of seq1
    subseq1 <= seq1[j]
    subseq2 <= seq2[i]
    if subseq1 = subseq2 then
      diag <= sc_matrix[i][j] + match
    else
      diag <= sc_matrix[i][j] + mismatch
    end if
    up <= sc_matrix[i][j] + gap
    left <= sc_matrix[i][j] + gap
    if diag >= up then
      if diag>= left then
        sc_matrix[i][j] <= diag
        p_matrix[i][j] <= 5
      else
        sc_matrix[i][j] <= left
        p_matrix[i][j] <= 1
      end if
    else
      if diag>= left then
        sc_matrix[i][j] <= up
        p_matrix[i][j] <= 4
      else
        sc_matrix[i][j] <= left
        p_matrix[i][j] <= 1
    end if
  end for
end for
al_seq1 <= ""
al_seq2 <= ""
keep_going <= true
i <= seq1.length
j <= seq2.length
while keep_going = true
  if p_matrix = 0
    keep_going = false
  end if
  if p_matrix[i][j] = 5 then
    al_seq1.add(seq1[i])
    al_seq2.add(seq1[j])
    i <= i - 1
    j <= j - 1
  else if p_matrix[i][j] = 1 then
    al_seq1.add(seq1[i])
    al_seq2.add("-")
    i <= i - 1
  else if p_matrix[i][j] = 4 then
    al_seq1.add("-")
    al_seq2.add(seq1[j])
    j <= j - 1   
  end if
end while
al_seq1 <= al_seq1.reverse
al_seq2 <= al_seq2.reverse
華夏公益教科書