跳轉至內容

Clojure 程式設計/示例/Norvig 拼寫糾正程式

來自維客中文,人人可編輯的開放內容百科全書

Peter Norvig 用 Python 編寫了一個拼寫糾正程式的優秀示例。你可以訪問 如何編寫拼寫糾正程式 檢視更詳細的說明。

這段程式碼由該語言的建立者 Rich Hickey 翻譯為 Clojure,它體現了 Clojure 類似於 Python 的簡潔性。

(defn words [text] (re-seq #"[a-z]+" (.toLowerCase text)))

(defn train [features]
  (reduce (fn [model f] (assoc model f (inc (get model f 0)))) {} features))

(def *nwords* (train (words (slurp "big.txt"))))

(defn edits1 [word]
  (let [alphabet "abcdefghijklmnopqrstuvwxyz", n (count word)]
    (distinct (concat
      (for [i (range n)] (str (subs word 0 i) (subs word (inc i))))
      (for [i (range (dec n))]
        (str (subs word 0 i) (nth word (inc i)) (nth word i) (subs word (+ 2 i))))
      (for [i (range n) c alphabet] (str (subs word 0 i) c (subs word (inc i))))
      (for [i (range (inc n)) c alphabet] (str (subs word 0 i) c (subs word i)))))))

(defn known [words nwords] (not-empty (set (for [w words :when (nwords w)]  w))))

(defn known-edits2 [word nwords]
  (not-empty (set (for [e1 (edits1 word) e2 (edits1 e1) :when (nwords e2)]  e2))))

(defn correct [word nwords]
  (let [candidates (or (known [word] nwords) (known (edits1 word) nwords) 
                       (known-edits2 word nwords) [word])]
    (apply max-key #(get nwords % 1) candidates)))

使用方法

(correct "misstake" *nwords*)
(correct "speling" *nwords*)

注意:使用 big.txt 進行培訓時,可能需要使用比預設值更大的堆來啟動 jvm。可以透過使用“-server”選項啟動 Java 或明確設定最大堆大小來增加堆大小:“java -Xmx128m ...”。

華夏公益教科書