XQuery/英國航運預報
英國航運預報由英國氣象局每天釋出四次,並在廣播、氣象局網站以及[不再] BBC 網站上釋出。但是,它無法以計算機可讀的形式獲得。
Tim Duckett 最近在部落格中談到建立推特流。他使用 Ruby 解析文字預報。文字形式的預報包含在氣象局和 BBC 網站上。但是正如 Tim 指出的那樣,這種格式是為語音設計的,它將類似的區域壓縮在一起以減少時段,並且難以解析。這裡採用的方法是抓取包含原始區域預報資料的 JavaScript 檔案。
以下指令碼使用這些 eXist 模組
- request - 獲取 HTTP 請求引數
- httpclient - GET 和 POST
- scheduler - 排程抓取任務
- dateTime - 格式化日期時間
- util - base64 轉換
- xmldb - 用於資料庫訪問
- 英國氣象局網站
[這種方法不再可行,因為檢索到的 JavaScript 檔案不再更新]
氣象局網頁顯示了逐區域預報,但網頁的這部分是由 JavaScript 從生成的 JavaScript 檔案中的資料生成的。在該檔案中,資料分配給多個數組。典型部分如下所示
// Bailey
gale_in_force[28] = "0";
gale[28] = "0";
galeIssueTime[28] = "";
shipIssueTime[28] = "1725 Sun 06 Jul";
wind[28] = "Northeast 5 to 7.";
weather[28] = "Showers.";
visibility[28] = "Moderate or good.";
seastate[28] = "Moderate or rough.";
area[28] = "Bailey";
area_presentation[28] = "Bailey";
key[28] = "Bailey";
// Faeroes
...
第一個函式使用 eXist httpclient 模組獲取當前 JavaScript 資料,並將 base64 資料轉換為字串
xquery version "3.0";
declare namespace httpclient = "http://exist-db.org/xquery/httpclient";
declare namespace met = "http://kitwallace.co.uk/wiki/met";
declare variable $met:javascript-file := "http://www.metoffice.gov.uk/lib/includes/marine/gale_and_shipping_table.js";
declare function met:get-forecast() as xs:string {
(: fetch the javascript source and locate the text of the body of the response :)
let $base64:= httpclient:get(xs:anyURI($met:javascript-file),true(),())/httpclient:body/text()
(: this is base64 encoded , so decode it back to text :)
return util:binary-to-string($base64)
};
第二個函式從 JavaScript 中選取一個區域預報,並解析程式碼以使用 JavaScript 陣列名稱生成 XML 結構。
declare function met:extract-forecast($js as xs:string, $area as xs:string) as element(forecast)? {
(: isolate the section for the required area, prefixed with a comment :)
let $areajs := normalize-space(substring-before( substring-after($js,concat("// ",$area)),"//"))
return
if($areajs ="") (: area not found :)
then ()
else
(: build an XML element containing elements for each of the data items, using the array names as the element names :)
<forecast>
{
for $d in tokenize($areajs,";")[position() < last()] (: JavaScript statements terminated by ";" - ignore the last empty :)
let $ds := tokenize(normalize-space($d),' *= *"') (: separate the LHS and RHS of the assignment statement :)
let $name := replace(substring-before($ds[1],"["),"_","") (: element name is the array name, converted to a legal name :)
let $val := replace($ds[2],'"','') (: element text is the RHS minus quotes :)
let $val := replace ($val,"<.*>","") (: remove embedded annotation - in shipissuetime :)
return
element {$name} {$val}
}
</forecast>
};
要獲取區域預報
let $js := met:get-forecast()
return
met:extract-forecast($js,"Fastnet")
例如,一個選定區域的輸出為
<forecast>
<galeinforce>0</galeinforce>
<gale>0</gale>
<galeIssueTime/>
<shipIssueTime>1030 Tue 28 Oct</shipIssueTime>
<wind>Southwest veering northeast, 5 or 6.</wind>
<weather>Rain at times.</weather>
<visibility>Good, occasionally poor.</visibility>
<seastate>Moderate or rough.</seastate>
<area>Fastnet</area>
<areapresentation>Fastnet</areapresentation>
<key>Fastnet</key>
</forecast>
需要將預報資料格式化為字串
declare function met:forecast-as-text($forecast as element(forecast)) as xs:string {
concat( $forecast/weather,
" Wind ", $forecast/wind,
" Visibility ", $forecast/visibility,
" Sea ", $forecast/seastate
)
};
let $js := met:get-forecast()
let $forecast := met:extract-forecast($js,"Fastnet")
return
<report>{met:forecast-as-text($forecast)}</report>
返回值
<report>Rain at times. Wind Southwest veering northeast, 5 or 6. Visibility Good, occasionally poor. Sea Moderate or rough.</report>
最後,這些函式可以在一個指令碼中使用,該指令碼接受航運區域名稱並返回 XML 訊息
let $js := met:get-forecast()
let $forecast := met:extract-forecast($js,"Fastnet")
return
<message area="{$area}" dateTime="{$forecast/shipIssueTime}">
{met:forecast-as-text($forecast)}
</message>
要建立適合簡訊(160 個字元)或推特(140 個字元限制)的訊息,可以透過縮寫常用詞來壓縮訊息。
建立並本地儲存詞語和縮寫的字典。該字典是在 Tim Duckett 的 Ruby 實現中的一些縮寫的基礎上開發的。
<dictionary>
<entry full="west" abbrev="W"/>
<entry full="westerly" abbrev="Wly"/>
..
<entry full="variable" abbrev="vbl"/>
<entry full="visibility" abbrev="viz"/>
<entry full="occasionally" abbrev="occ"/>
<entry full="showers" abbrev="shwrs"/>
</dictionary>
縮寫函式將文字分解為詞語,用縮寫替換詞語,然後重新構建文字
declare function met:abbreviate($forecast as xs:string) as xs:string {
string-join(
(: lowercase the string, append a space (to ensure a final . is matched) and tokenise :)
for $word in tokenize(concat(lower-case($forecast)," "),"\.? +")
return
(: if there is an entry for the word , use its abbreviation, otherwise use the unabbreviated word :)
( /dictionary/entry[@full=$word]/@abbrev,$word) [1]
,
" ") (: join the words back up with space separator :)
};
import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";
let $area := request:get-parameter("area","Lundy")
let $forecast := met:get-forecast($area)
return
<message area="{$area}" dateTime="{$forecast/shipIssueTime}">
{met:abbreviate(met:forecast-as-text($forecast))}
</message>
此函式是區域預報的擴充套件。解析使用註釋分隔符來分割指令碼,忽略第一個和最後一個部分以及註釋中的區域名稱
declare function met:get-forecast() as element(forecast)* {
let $jsuri := "http://www.metoffice.gov.uk/lib/includes/marine/gale_and_shipping_table.js"
let $base64:= httpclient:get(xs:anyURI($jsuri),true(),())/httpclient:body/text()
let $js := util:binary-to-string($base64)
for $js in tokenize($js,"// ")[position() > 1] [position()< last()]
let $areajs := concat("gale",substring-after($js,"gale"))
return
<forecast>
{
for $d in tokenize($areajs,";")[position() < last()]
let $ds := tokenize(normalize-space($d)," *= *")
return
element {replace(substring-before($ds[1],"["),"_","")}
{replace($ds[2],'"','')}
}
</forecast>
};
此指令碼返回完整的航運預報的 XML 版本。
import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";
<ShippingForecast>
{met:get-forecast()}
</ShippingForecast>
XSLT 適合將此 XML 轉換為 RSS 格式……
此資料的一種可能用途是提供一個按需簡訊服務,使用者輸入區域名稱,系統返回簡化的預報。生成完整的預報集,選擇使用者提供的區域的預報,並以簡化的資訊形式返回。
import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";
let $area := lower-case(request:get-parameter("text",()))
let $forecast := met:get-forecast()[lower-case(area) = $area]
return
if (exists($forecast))
then
concat("Reply: ", met:abbreviate(met:forecast-as-text($forecast)))
else
concat("Reply: Area ",$area," not recognised")
簡訊服務的呼叫協議由安裝在 UWE 的簡訊服務確定,其描述在此。
按需獲取 JavaScript 既不高效也不符合網路行為,由於預報時間已知,最好按照預定的時間表獲取資料,轉換為 XML 格式並儲存到 eXist 資料庫中,然後將快取的 XML 用於後續請求。
import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";
declare variable $col := "/db/Wiki/Met/Forecast";
if (xmldb:login($col, "user", "password")) (: a user who has write access to the Forecast collection :)
then
let $forecast := met:get-forecast()
let $forecastDateTime := met:timestamp-to-xs-date(($forecast/shipIssueTime)[1]) (: convert to xs:dateTime :)
let $store := xmldb:store(
$col, (: collection to store forecast in :)
"shippingForecast.xml", (: file name - overwrite is OK here as we only want the latest :)
(: then the constructed XML to be stored :)
<ShippingForecast at="{$forecastDateTime}" >
{$forecast}
</ShippingForecast>
)
return
<result>
Shipping forecast for {string($forecastDateTime)} stored in {$store}
</result>
else ()
源資料中使用的 timestamps 轉換為 xs:dateTime 以便於後續處理。
declare function met:timestamp-to-xs-date($dt as xs:string) as xs:dateTime {
(: convert timestamps in the form 0505 Tue 08 Jul to xs:dateTime :)
let $year := year-from-date(current-date()) (: assume the current year since none provided :)
let $dtp := tokenize($dt," ")
let $mon := index-of(("Jan","Feb", "Mar","Apr","May", "Jun","Jul","Aug","Sep","Oct","Nov","Dec"),$dtp[4])
let $monno := if($mon < 10) then concat("0",$mon) else $mon
return xs:dateTime(concat($year,"-",$monno,"-",$dtp[3],"T",substring($dtp[1],1,2),":",substring($dtp[1],3,4),":00"))
};
原始資料包含冗餘元素(區域名稱的多個版本)和通常為空的元素(在沒有大風預警的情況下,所有與大風相關的元素),但缺少用作鍵的規範化區域名稱。以下函式執行此重構。
declare function met:reduce($forecast as element(forecast)) as element(forecast) {
<forecast>
{ attribute area {lower-case($forecast/area)}}
{ $forecast/*
[not(name(.) = ("shipIssueTime","area","key"))]
[ if (../galeinforce = "0" )
then not(name(.) = ("galeinforce","gale","galeIssueTime"))
else true()
]
}
</forecast>
};
可以使用 XSLT 進行此轉換,快取指令碼在儲存預報之前會應用此轉換。
修改後的簡訊指令碼現在可以訪問快取。首先是一個獲取儲存的預報的函式。
declare function met:get-stored-forecast($area as xs:string) as element(forecast) {
doc("/db/Wiki/Met/Forecast/shippingForecast.xml")/ShippingForecast/forecast[@area = $area]
};
import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";
let $area := lower-case(normalise-space(request:get-parameter("text",())))
let $forecast := met:get-stored-forecast($area)
return
if (exists($forecast))
then
concat("Reply: ", datetime:format-dateTime($forecast/../@at,"HH:mm")," ",met:abbreviate(met:forecast-as-text($forecast)))
else
concat("Reply: Area ",$area," not recognised")
在此指令碼中,透過 met 函式呼叫提取的輸入區域的選定預報是對資料庫元素的引用,而不是副本。因此,仍然可以導航回包含 timestamps 的父元素。
eXist 的 datetime 函式是 Java 類 java.text.SimpleDateFormat 的包裝器,它定義了日期格式化語法。
eXist 包含一個排程程式模組,它是 Quartz 排程程式 的包裝器。作業只能由 DBA 使用者建立。
例如,要設定一個每小時獲取航運預報的作業,
let $login := xmldb:login( "/db", "admin", "admin password" )
let $job := scheduler:schedule-xquery-cron-job("/db/Wiki/Met/getandsave.xq" , "0 0 * * * ?")
return $job
其中 "0 0 * * * ?" 表示在每月的每一天的每小時的 0 秒 0 分執行,忽略星期幾。
要檢查已安排的作業集(包括系統排程作業),
let $login := xmldb:login( "/db", "admin", "admin password" )
return scheduler:get-scheduled-jobs()
最好根據預報的更新時間表安排作業。這些時間是 0015、0505、1130 和 1725。這些時間無法擬合到單個 cron 模式中,因此需要多個作業。由於作業透過其路徑標識,因此所有例項無法使用相同的 url,因此添加了一個虛擬引數。
討論 這些時間比釋出的時間晚一分鐘。這可能不足以彌補雙方時間上的差異。顯然,來自英國氣象局的推送會比拉取抓取更好。排程程式時鐘以當地時間 (BST) 執行,釋出時間也是如此。
let $login := xmldb:login( "/db", "admin", "admin password" )
let $job1 := scheduler:schedule-xquery-cron-job("/db/Wiki/Met/getandsave.xq?t=1" , "0 16 0 * * ?")
let $job2 := scheduler:schedule-xquery-cron-job("/db/Wiki/Met/getandsave.xq?t=2" , "0 6 5 * * ?")
let $job3 := scheduler:schedule-xquery-cron-job("/db/Wiki/Met/getandsave.xq?t=3" , "0 31 11 * * ?")
let $job4 := scheduler:schedule-xquery-cron-job("/db/Wiki/Met/getandsave.xq?t=4" , "0 26 17 * * ?")
return ($job1, $job2, $job3, $job4)
英國氣象局提供一個 可點選的地圖,但 KML 地圖會更好。海區的 座標 可以被捕獲並手動轉換為 XML。
<?xml version="1.0" encoding="UTF-8"?>
<boundaries>
<boundary area="viking">
<point latitude="61" longitude="0"/>
<point latitude="61" longitude="4"/>
<point latitude="58.5" longitude="4"/>
<point latitude="58.5" longitude="0"/>
</boundary>
...
一個區域的邊界透過兩個函式訪問。在這種習慣用法中,一個函式隱藏了文件位置並返回文件的根節點。後續函式使用此基本函式來獲取文件,然後應用其他謂詞進行過濾。
declare function met:area-boundaries() as element(boundaries) {
doc("/db/Wiki/Met/shippingareas.xml")/boundaries
};
declare function met:area-boundary($area as xs:string) as element(boundary) {
met:area-boundaries()/boundary[@area=$area]
};
一個區域的中心可以透過對緯度和經度求平均值來粗略計算。
declare function met:area-centre($boundary as element(boundary)) as element(point) {
<point
latitude="{round(sum($boundary/point/@latitude) div count($boundary/point) * 100) div 100}"
longitude="{round(sum($boundary/point/@longitude) div count($boundary/point) * 100) div 100}"
/>
};
我們可以從預報中生成一個 kml PlaceMark。
declare function met:forecast-to-kml($forecast as element(forecast)) as element(Placemark) {
let $area := $forecast/@area
let $boundary := met:area-boundary($area)
let $centre := met:area-centre($boundary)
return
<Placemark >
<name>{string($forecast/areapresentation)}</name>
<description>
{met:forecast-as-text($forecast)}
</description>
<Point>
<coordinates>
{string-join(($centre/@longitude,$centre/@latitude),",")}
</coordinates>
</Point>
</Placemark>
};
由於我們有區域座標,我們還可以生成邊界作為 kml 中的一條線。
declare function met:sea-area-to-kml(
$area as xs:string,
$showname as xs:boolean
) as element(Placemark)
{
let $boundary := met:area-boundary($area)
return
<Placemark >
{if($showname) then <name>{$area}</name> else()}
<LineString>
<coordinates>
{string-join(
for $point in $boundary/point
return
string-join(($point/@longitude,$point/@latitude,"0"),",")
, " "
)
}
</coordinates>
</LineString>
</Placemark>
};
import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";
(: set the media type for a kml file :)
declare option exist:serialize "method=xml indent=yes
media-type=application/vnd.google-earth.kml+xml";
(: set the file name ans extension when saved to allow GoogleEarth to be invoked :)
let $dummy := response:set-header('Content-Disposition','inline;filename=shipping.kml;')
(: get the latest forecast :)
let $shippingForecast := met:get-stored-forecast()
return
<kml >
<Folder>
<name>{datetime:format-dateTime($shippingForecast/@at,"EEEE HH:mm")} UK Met Office Shipping forecast</name>
{for $forecast in $shippingForecast/forecast
return
(met:forecast-to-kml($forecast),
met:sea-area-to-kml($forecast/@area,false())
)
}
</Folder>
</kml>
此資料的另一種用途是提供一個通道,以便在收到預報後立即推送預報。該通道可以是傳送給訂閱者的簡訊提醒,也可以是使用者可以關注的專用 Twitter 流。
此服務應允許使用者請求特定區域或區域的提醒。該應用程式需要
- 一個數據結構來記錄訂閱者及其區域
- 一個 Web 服務來註冊使用者、其手機號碼和初始區域 [待辦]
- 一個簡訊服務來更改所需的區域並開啟或關閉訊息傳遞
- 一個計劃任務,在獲取新預報時推送簡訊
<subscriptions>
<subscription>
<username>Fred Bloggs</username>
<password>hafjahfjafa</password>
<mobilenumber>447777777</mobilenumber>
<area>lundy</area>
<status>off</status>
</subscription>
...
</subscriptions>
(待完成)
需要控制對該文件的訪問。
第一層訪問控制是將檔案放在一個無法透過網路訪問的集合中。在 UWE 伺服器中,根節點(透過 mod-rewrite)是集合 /db/Wiki,因此此目錄和子目錄中的資源是可訪問的,但受檔案上的訪問設定的約束,但父目錄或同級目錄中的檔案不可訪問。因此,此文件儲存在 /db/Wiki2 目錄中。此檔案相對於外部根節點的 URL 是 http://www.cems.uwe.ac.uk/xmlwiki/../Wiki2/shippingsubscriptions.xml,但訪問失敗。
第二級控制是設定檔案的所有者和許可權。這是必要的,因為防火牆後面的客戶端上的使用者使用內部伺服器地址將能夠訪問此檔案。預設情況下,世界許可權設定為讀取和更新。刪除此訪問許可權需要指令碼以組或所有者身份登入以讀取。
所有權和許可權可以透過 Web 客戶端或 eXist xmldb 模組中的函式設定。
此函式接受訂閱,制定簡訊內容,並呼叫通用 sms:send 函式進行傳送。它與我們的簡訊服務提供商對接。
declare function met:push-sms($subscription as element(subscription)) as element(result) {
let $area := $subscription/area
let $forecast := met:get-stored-forecast($area)
let $time := datetime:format-dateTime($forecast/../@at,"EE HH:mm")
let $text := encode-for-uri(concat($area, " ",$time," ",met:abbreviate(met:forecast-as-text($forecast))))
let $number := $subscription/mobilenumber
let $sent := sms:send($number,$text)
return
<result number="{$number}" area="{$area}" sent="{$sent}"/>
};
首先,我們需要獲取活動訂閱。這些函式遵循用於邊界的相同習慣用法
declare function met:subscriptions() {
doc("/db/Wiki2/shippingsubscriptions.xml")/subscriptions
};
declare function met:active-subscriptions() as element(subscription) * {
met:subscriptions()/subscription[status="on"]
};
然後迭代活動訂閱並報告結果。
declare function met:push-subscriptions() as element(results) {
<results>
{
let $dummy := xmldb:login("/db","webuser","password")
for $subscription in met:active-subscriptions()
return
met:push-sms($subscription)
}
</results>
};
此指令碼遍歷當前活動的訂閱,併為每個訂閱呼叫推送簡訊函式。
import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";
met:push-subscriptions()
此任務可以安排在快取任務執行後執行,也可以修改快取指令碼,使其在快取任務完成後呼叫訂閱任務。但是,eXist 也支援觸發器,因此該任務也可以由資料庫事件觸發,該事件在完成預報檔案儲存時引發。
需要一種訊息格式來編輯訂閱狀態和更改訂閱區域。
metsub [ on |off |<area> ]
如果區域發生變化,狀態將設定為開啟。
該區域根據區域程式碼列表進行驗證。這些程式碼是從邊界資料中提取的。
declare function met:area-names() as xs:string* {
met:area-boundaries()/boundary/string(@area)
};
import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";
let $login:= xmldb:login("/db","user","password")
let $text := normalize-space(request:get-parameter("text",()))
let $number := request:get-parameter("from",())
let $subscription := met:get-subscription($number)
return
if (exists($subscription))
then
let $update :=
if ( $text= "on")
then update replace $subscription/status with <status>on</status>
else if( $text = "off")
then update replace $subscription/status with <status>off</status>
else if ( lower-case($text) = met:area-names())
then ( update replace $subscription/area with <area>{$text}</area>,
update replace $subscription/status with <status>on</status>
)
else ()
return
let $subscription := met:get-subscription($number)(: get the subscription post update :)
return
concat("Reply: forecast is ",$subscription/status," for area ",$subscription/area)
else ()
Twitter 具有簡單的 REST API 來更新狀態。我們可以用它將預報推送到 Twitter 帳戶。Twitter 使用基本訪問身份驗證,並且有一個合適的 XQuery 函式可以將訊息傳送到使用者名稱/密碼,使用 eXist httpclient 模組可以實現。
declare function met:send-tweet ($username as xs:string,$password as xs:string,$tweet as xs:string ) as xs:boolean {
let $uri := xs:anyURI("http://twitter.com/statuses/update.xml")
let $content :=concat("status=", encode-for-uri($tweet))
let $headers :=
<headers>
<header name="Authorization"
value="Basic {util:string-to-binary(concat($username,":",$password))}"/>
<header name="Content-Type"
value="application/x-www-form-urlencoded"/>
</headers>
let $response := httpclient:post( $uri, $content, false(), $headers )
return
$response/@statusCode='200'
};
需要一個指令碼來訪問儲存的預報並將某個區域的預報推送到 Twitter。可以為每個航運區域設定不同的 Twitter 帳戶。該指令碼需要安排在獲取完整預報後執行。
在本例中,特定區域的預報將推送到硬編碼的 Twitter 使用者。
import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";
declare variable $username := "kitwallace";
declare variable $password := "mypassword";
declare variable $area := request:get-parameter("area","lundy");
let $forecast := met:get-stored-forecast($area)
let $time := datetime:format-dateTime($forecast/../@at,"HH:mm")
let $message := concat($area," at ",$time,":",met:abbreviate(met:forecast-as-text($forecast)))
return
<result>{met:send-tweet($username,$password,$message)}</result>
此任務非常適合使用 XForms。
使用觸發器在更新完成後推送簡訊。