系統發育學/比對準備/全域性比對
外觀
全域性比對方法旨在比對兩個序列的整個長度。如果沒有序列演化的模型,這些方法是非常通用的,並且可以應用於任何序列(不僅僅是生物學的)相似性。
要討論的第一個演算法是 Needleman-Wunsch。
requires: seq1 = first sequence; seq2 = second sequence
//scoring matrices values
match = 1
mismatch = -1
gap = -1
sc_matrix //initialize scoring matrix the size of which is the largest sequence length
p_matrix //initialize pointing matrix the size of which is the largest sequence length
sc_matrix[0][0] = 0
p_matrix[0][0] = 0 // 0 = no where; 1 = left; 2 = right; 3 = down; 4 = up; diagonal = 5;
for i=0 to length of seq1
sc_matrix[0][i] = gap * i
p_matrix[0][i] = 1
end for
for i=0 to length of seq2
sc_matrix[i][0] = gap * i
p_matrix[i][0] = 1
end for
for i=0 to length of seq2
for j=0 to length of seq1
subseq1 <= seq1[j]
subseq2 <= seq2[i]
if subseq1 = subseq2 then
diag <= sc_matrix[i][j] + match
else
diag <= sc_matrix[i][j] + mismatch
end if
up <= sc_matrix[i][j] + gap
left <= sc_matrix[i][j] + gap
if diag >= up then
if diag>= left then
sc_matrix[i][j] <= diag
p_matrix[i][j] <= 5
else
sc_matrix[i][j] <= left
p_matrix[i][j] <= 1
end if
else
if diag>= left then
sc_matrix[i][j] <= up
p_matrix[i][j] <= 4
else
sc_matrix[i][j] <= left
p_matrix[i][j] <= 1
end if
end for
end for
al_seq1 <= ""
al_seq2 <= ""
keep_going <= true
i <= seq1.length
j <= seq2.length
while keep_going = true
if p_matrix = 0
keep_going = false
end if
if p_matrix[i][j] = 5 then
al_seq1.add(seq1[i])
al_seq2.add(seq1[j])
i <= i - 1
j <= j - 1
else if p_matrix[i][j] = 1 then
al_seq1.add(seq1[i])
al_seq2.add("-")
i <= i - 1
else if p_matrix[i][j] = 4 then
al_seq1.add("-")
al_seq2.add(seq1[j])
j <= j - 1
end if
end while
al_seq1 <= al_seq1.reverse
al_seq2 <= al_seq2.reverse