應用程式/偽隨機重複子字串的 Visual Basic
外觀
此頁面描述了一些適用於 VBA 的 Rnd() 函式的問題。特別是它說明了當 Randomize() 函式錯誤地放置在同一個迴圈 *內部* 而不是 *之前* 時,可能會出現重複的子字串。
- Rnd() 函式是偽隨機的,而不是真正隨機的。真正的隨機性很少見,一個著名的例子是可以透過白噪聲獲得的資料序列。白噪聲,就像來自太陽的無線電噪聲,或者可能是無線電或其他電子裝置中的無意噪聲,具有相當均勻的頻率分佈,並且可以用來產生隨機分佈的資料;也稱為 *線性機率分佈*,因為它們的頻率分佈是平行於水平軸的直線。
- 偽隨機性 可以透過反饋演算法獲得,其中函式的一系列輸出值被反饋並有助於生成輸出流的下一部分。這些被稱為偽隨機數生成器 (PRNG)。這樣的過程雖然複雜,但仍然是確定性的,完全基於它的起始值。這樣的生成器,根據其設計,可以產生長序列的值,所有這些值都是唯一的,然後整個流最終會重複自身。
- 如果生成足夠長的值集,PRNG 輸出流總是會重複自身。VBA 中的 *Rnd* 函式可以在任何一個數字重複之前生成最多 16,777,216 個數字的序列,此時整個序列本身會重複。這在大多數情況下足夠了。微軟將 Rnd() 函式描述為屬於稱為線性同餘生成器 (LCG) 的 PRNG 集,儘管不清楚該演算法是否已被修改。
- Rnd 函式不適合大型表格或加密使用,而 VBA 本身在很大程度上是不安全的。對於給定的起始值,生成器總是會產生相同的序列。顯然,如果流的任何一部分已知,這將允許預測序列中的其他值,這種情況對於加密使用是不安全的。也許令人驚訝的是,大量使用隨機值的建模方法需要比 *Rnd()* 生成的更長的唯一序列。
- 微軟 Rnd() 函式的精確編碼不可用,他們對此的描述資料非常簡略。我最近嘗試在 VBA 程式碼中實現假設的演算法失敗了,因為發生了溢位,因此那些打算在 VBA 中研究此類生成器的人需要使用其他演算法。也許研究 Wichmann-Hill (1982) CLCG 演算法是一個更好的選擇,該演算法可以在 VBA 中實現。另一個頁面提供了 Wichmann-Hill (1982) 演算法的 VBA 實現(由其他人完成),以及一些更簡單的生成器示例。
- 一個設計良好的 PRNG 流由唯一數字組成,但這隻適用於設計者從零到一 [0,1] 範圍內的未過濾數字集。一旦我們開始從流中獲取一些值,而忽略其他值,例如為了生成自定義輸出,新流將呈現出不同的特徵。從自然序列中挑選元素和將大型集合對映到非常小的集合的組合會造成損害。在觀察新集合時,迴圈重複點的字元計數會縮短,並且整個集合中重複子字串的數量會增加。
- 下面的程式碼列表允許使用預設過濾器設定檢查 Rnd() 流中的子字串,例如;大寫字母、小寫字母、整數等,此外,還包括一個基於雜湊的類似生成器,供那些希望比較它的人使用。
- 重複子字串過程相當慢,因為它依賴於重複的位置。最壞的情況是*沒有找到重複*,其中迴圈的數量在*(0.5*n)^2* 處達到最大值,即測試字串中字元數的一半的平方。當然,最小的迴圈數只有 *1*,當一個簡單的字串重複時,例如;abcabc。顯然,字串長度增加十倍可能會使執行時間增加一百倍。(大約 2 秒內 1000 個字元,4 秒內 2000 個字元,200 秒內 10000 個字元,到目前為止,是最好的時間!)。
- 編碼佈局也會影響重複子字串的長度。讀者可以比較將 *Randomize* 函式放置在隨機數迴圈 *外部*,然後放置在迴圈 *內部* 的效果,同時只輸出大寫字母。(見 *MakeLongRndStr* 中的程式碼)。在過去,當放置在內部時,重複字串會大大惡化。此處列出的用於測試 Rnd() 的程式碼(使用 1000 個大寫字母樣本,不使用 DoEvents,並且 Randomize 錯誤地放置在迴圈 *內部*),對於本文作者來說,將返回長達 400 個字元的重複子字串。迴圈中的程式碼行增加,影響每個迴圈迭代的執行時間(?)也會影響任何子字串的長度。
Option Explicit
Sub TestRndForRepeats()
'run this to make a pseudo random string
'and to test it for longest repeated substring
Dim strRnd As String, sOut As String
Dim nOut As Long, nLen As Long
strRnd = MakeLongRndStr(1000)
MsgBox strRnd,, "Long string..."
sOut = LongestRepeatSubstring(strRnd, nOut)
MsgBox "Repeated substring found is : " & _
vbCrLf & sOut & vbCrLf & _
"Number of these found : " & nOut & vbCrLf & _
"Length of each : " & Len(sOut),, "Repeat substring..."
End Sub
Sub TestHashForRepeats()
'run this to make a long hash-based output
'and to test it for longest repeated substring
Dim sOut As String, sHash As String, nOut As Long
sHash = LongHash("String to hash", 1000)
MsgBox "The following sha256-based hash has " & _
Len(sHash) & " characters." & _
vbCrLf & vbCrLf & sHash,, "Long hash..."
sOut = LongestRepeatSubstring(sHash, nOut)
MsgBox "Repeated substring found is : " & _
vbCrLf & sOut & vbCrLf & _
"Number of these found : " & nOut & vbCrLf & _
"Length of each : " & Len(sOut),, "Repeat substring..."
End Sub
Function MakeLongRndStr(nNumChr As Long) As String
'Makes a long capital letters string using rnd VBA function
Dim n As Long, sChr As String, nAsc As Long
Dim nSamp As Long, sRec As String
'========================================================================
' Notes and Conclusions:
' The VBA function rnd is UNSUITED to generation of long random strings.
' Both length and number of repeats increases rapidly near 256 charas.
' Reasonable results can be obtained by keeping below 128 characters.
' For longer strings, consider hash-based methods of generation.
'========================================================================
'Randomize 'right place
Do Until n >= nNumChr
'DoEvents
Randomize 'wrong place
nSamp = Int((122 - 48 + 1) * Rnd + 48) 'range includes all charas
sChr = Chr(nSamp)
'cherry-picks 10, 26, 36, 52, or 62 from a set of 75
Select Case nSamp 'chara filter
Case 65 To 90 'upper case letters
sRec = sRec & sChr
Case 48 To 57 'integers
'sRec = sRec & sChr
Case 97 To 122 'lower case letters
'sRec = sRec & sChr
Case Else
'disregard
End Select
n = Len(sRec)
Loop
'MsgBox sAccum
MakeLongRndStr = Left$(sRec, nNumChr)
End Function
Function LongHash(sIn As String, nReq As Long, Optional sSeed As String = "") As String
'makes a long sha256 hash - length specified by user
'Parameters: sIn; the string to hash
'nReq; the length of output needed
'sSeed; optional added string modifier
Dim n As Long, m As Long, c As Long, nAsc As Integer, sChr As String
Dim sF As String, sHash As String, sRec As String, sAccum As String
Do Until m >= nReq
DoEvents
n = n + 1 'increment
'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
'you set your own cycle increment here
sF = sIn & sSeed & sAccum & (7 * n * m / 3)
'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
'get a single hash of sF
sHash = HashSHA256(sF)
'filter output for chara type
For c = 1 To Len(sHash)
sChr = Mid$(sHash, c, 1)
nAsc = Asc(sChr)
'cherry-picks 10, 26, 36 ,52, or 62 from a set of 64
Select Case nAsc 'chara filter
Case 65 To 90 'upper case letters
sRec = sRec & sChr
Case 48 To 57 'integers
'sRec = sRec & sChr
Case 97 To 122 'lower case letters
'sRec = sRec & sChr
Case Else
'disregard
End Select
Next c
'accumulate
sAccum = sAccum & sRec
m = Len(sAccum)
sRec = "" 'delete line at your peril!
Loop
LongHash = Left$(sAccum, nReq)
End Function
Function HashSHA256(sIn As String) As String
'Set a reference to mscorlib 4.0 64-bit
'HASHES sIn string using SHA2-256 algorithm
'NOTE
'total 88 output text charas of base 64
'Standard empty string input gives : 47DEQpj8HBSa+/...etc,
Dim oT As Object, oSHA256 As Object
Dim TextToHash() As Byte, bytes() As Byte
Set oT = CreateObject("System.Text.UTF8Encoding")
Set oSHA256 = CreateObject("System.Security.Cryptography.SHA256Managed")
TextToHash = oT.GetBytes_4(sIn)
bytes = oSHA256.ComputeHash_2((TextToHash))
HashSHA256 = ConvB64(bytes)
Set oT = Nothing
Set oSHA256 = Nothing
End Function
Function ConvB64(vIn As Variant) As Variant
'used to produce a base-64 output
'Set a reference to mscorlib 4.0 64-bit
Dim oD As Object
Set oD = CreateObject("MSXML2.DOMDocument")
With oD
.LoadXML "<root />"
.DocumentElement.DataType = "bin.base64"
.DocumentElement.nodeTypedValue = vIn
End With
ConvB64 = Replace(oD.DocumentElement.Text, vbLf, "")
Set oD = Nothing
End Function
Function LongestRepeatSubstring(sIn As String, Optional nSS As Long) As String
'finds longest repeated non-overlapping substring (in function name) and number of repeats (in nSS)
'greatest number cycles = (0.5*n)^2 for when "none found", eg; abcdef (9)
'shortest number cycles = 1 for one simple duplicated string; eg abcabc
Dim s1 As String, s2 As String, X As Long
Dim sPrev As String, nPrev As Long, nLPrev As Long
Dim nL As Long, nTrial As Long, nPos As Long, vAr As Variant
nL = Len(sIn)
For nTrial = Int(nL / 2) To 1 Step -1
DoEvents
For nPos = 1 To (nL - (2 * nTrial) + 1)
X = 0
s1 = Mid(sIn, nPos, nTrial)
s2 = Right(sIn, (nL - nPos - nTrial + 1))
vAr = Split(s2, s1)
X = UBound(vAr) - LBound(vAr)
If X > 0 Then
If nPrev < X Then
sPrev = s1
nPrev = X
End If
End If
Next nPos
If nPrev <> 0 Then
LongestRepeatSubstring = sPrev
nSS = nPrev
Exit Function
End If
Next nTrial
End Function