Stata/自然語言處理

讀取文字檔案

如果行很短（少於 244 個字串字元），可以使用insheet。此命令將讀取文字檔案到 Stata 的記憶體中。

. insheet using toto.txt, clear

首先檢視Stata中已包含的字串函式列表。

. h string functions

Stata 包含用於正則表示式的命令 regexm()、regexr() 和 regexs()。

Ken Benoit、Michael Laver 和 Will Lowe 開發了wordscores，這是一組 Stata 命令，用於讀取文字檔案，計算每個詞的頻率，並計算文字之間的一些相似性指標。