XQuery/DBpedia 與 SPARQL - 足球隊
DBpedia 是一個將維基百科內容轉換為 RDF 的專案,以便將其連結到其他資料集,以豐富語義網。它提供了 w:SPARQL 端點 用於查詢這個資料庫。
這個應用程式使用 DBpedia 建立一個 kml 檔案,顯示所選英國足球隊的成員的出生地。資料質量受到一些因素的限制
- DBpedia 所基於的維基百科摘錄的年代
- 維基百科中球員個人頁面的存在與否
- 維基百科資訊框中屬性標籤的一致性
declare variable $query := "
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX p: <http://dbpedia.org/property/>
SELECT * WHERE {
?player p:currentclub <http://dbpedia.org/resource/Arsenal_F.C.>.
OPTIONAL {?player p:cityofbirth ?city}.
OPTIONAL {?player p:dateOfBirth ?dob}.
OPTIONAL {?player p:clubnumber ?no}.
OPTIONAL {?player p:position ?position}.
OPTIONAL {?player p:image ?image}.
OPTIONAL {
{ ?city geo:long ?long. }
UNION
{ ?city p:redirect ?city2.
?city2 geo:long ?long.
}.
}.
OPTIONAL {
{ ?city geo:lat ?lat.}
UNION
{ ?city p:redirect ?city3.
?city3 geo:lat ?lat.
}.
}.
}
";
這個查詢由於需要處理城市名稱的可能重定向而變得複雜 - (可以改進嗎 - 這是一個通用的問題?)。為了獲得更完整的資料,查詢還應該處理用於出生地和出生日期的多個同義詞
DBpedia 的更改導致了基於資料模型和詞彙的查詢的短暫生命週期。截至 2011 年 1 月,該查詢正在更新。目前,要獲取阿森納當前球員的出生地和出生日期,以下查詢似乎有效。
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX p: <http://dbpedia.org/property/>
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT * WHERE {
<http://dbpedia.org/resource/Arsenal_F.C.> p:name ?player.
?player dbpedia-owl:birthPlace ?city;
dbpedia-owl:birthDate ?dob.
?city geo:long ?long;
geo:lat ?lat.
}
但是,這會產生多個地理編碼位置,可以假設第一個位置是最具體的(但不能在 SPARQL 中過濾嗎?)。
原型 SPARQL 查詢針對阿森納足球俱樂部。這個球隊名稱需要被提供的球隊名稱替換,然後查詢進行 URI 編碼,並傳遞給 DBpedia SPARQL 端點。
let $club := request:get-parameter ("club","Arsenal_F.C.")
let $queryx := replace($query,"Arsenal_F.C.",$club)
旁註:最初,該查詢是用一個通用的佔位符 ($team) 而不是一個原型值(阿森納足球俱樂部)編寫的。原型語法的好處是提供了可執行的 SPARQL 查詢,無需編輯,更具表現力且更不易出錯 - $team 中的 $ 需要在替換表示式中轉義,因為第二個引數是一個正則表示式。
這個查詢使用 Virtuoso 引擎提供的 SPARQL 端點。結果的格式定義為 XML,即 SPARQL 查詢結果格式。一個函式清理了介面
declare function local:execute-sparql($query as xs:string) {
let $sparql := concat("http://dbpedia.org/sparql?format=xml&default-graph-uri=http://dbpedia.org&query=",
encode-for-uri($query)
)
return doc($sparql)
};
結果採用 SPARQL 查詢結果 XML 格式。將其轉換為具有命名元素的元組,以便進行後續處理會更加方便。
declare namespace r = "http://www.w3.org/2005/sparql-results#";
declare function local:sparql-to-tuples($rdfxml ) {
for $result in $rdfxml//r:result
return
<tuple>
{ for $binding in $result/r:binding
return
if ($binding/r:uri)
then element {$binding/@name} {
attribute type {"uri"} ,
string($binding/r:uri)
}
else element {$binding/@name} {
attribute type {$binding/r:literal/@datatype},
string($binding/r:literal)
}
}
</tuple>
};
let $result:= local:execute-sparql($queryx) let $tuples := local:sparql-to-tuples($result)
由於我們正在生成 kml,因此需要設定媒體型別和檔名,並建立一個 Document 節點 - 在指令碼的適當位置
declare option exist:serialize "method=xhtml media-type=application/vnd.google-earth.kml+xml highlight-matches=none";
let $x := response:set-header('Content-disposition',concat('Content-disposition: inline;filename=',$team,'.kml;'))
return
<Document>
<name>Birthplaces of players in the {$team} squad</name>
<Style id="player">
<IconStyle>
<Icon><href>http://maps.google.com/mapfiles/kml/pal2/icon49.png</href>
</Icon>
</IconStyle>
</Style>
.....
</Document>
該圖示是 GoogleEarth 的一個庫存足球運動員圖示。
由於某些屬性有多個值,例如 cityofbirth 通常表示為地址路徑,因此每個球員都有多個元組。這些需要分組和壓縮。這裡我們使用 XQuery 語法,它使用 distinct-values 獲取一組球員姓名,然後使用姓名作為鍵訪問一組行。這個指令碼採用了簡單的方法,只使用包含緯度值的前一個元組,等待對多個 cityofbirth 值更好的解決方案。
我們只對出生地點已進行地理編碼的球員感興趣,因此我們過濾包含緯度元素的元組
{
for $playername in distinct-values($tuples[lat]/player)
let $player := $tuples[player=$playername][lat][1]
wikiPedia 資料在可用於 kml 之前需要進行一些清理。一個通用的清理函式對 URI 編碼的字元進行解碼,刪除一些不相關的文字,並將下劃線替換為空格。(這個 hack 需要改進)
declare function local:clean($text) {
let $text:= util:unescape-uri($text,"UTF-8")
let $text := replace($text,"http://dbpedia.org/resource/","")
let $text := replace($text,"\(.*\)","")
let $text := replace($text,"Football__positions#","")
let $text := replace($text,"#",",")
let $text := replace($text,"_"," ")
return $text
};
let $name := local:clean($player/player)
let $city :=local:clean($player/city)
let $position := local:clean($player/position)
出生日期採用 xs:date 格式,但為可選值。如果該值為有效日期,則使用 eXist 函式將其轉換為更易讀的格式
let $dob := if ($player/dob castable as xs:date) then datetime:format-date(xs:date($player/dob),"dd MMM, yyyy" ) else ""
職位號碼也是如此,它應該是 xs:integer。由於有時一個球隊中的幾個球員來自同一個地方,因此對映的職位會稍微抖動。
let $no := if ($player/no castable as xs:integer) then concat(" [# ", xs:integer($player/no) ,"] ") else ""
緯度和經度應該是 xs:decimal。由於有時一個球隊中的幾個球員來自同一個地方,因此對映的職位會稍微抖動。
let $lat :=xs:decimal($player/lat) + (math:random() - 0.5)* 0.01
let $long :=xs:decimal($player/long) + (math:random() - 0.5)* 0.01
地標描述的正文將包含 XHTML 標記,以在有影像時顯示影像,並連結到 DBpedia 頁面。XML 需要序列化為字串,以便 GoogleMap 在彈出視窗中渲染描述
let $description :=
<div>
{concat ($position, $no, " born ", $dob, " in ", $city)}
<div>
<a href="{$player/player}">DBpedia</a>
<a href="http://images.google.co.uk/images?q={$name}">Google Images</a>
</div>
{if ($player/image !="")
then <div><img src="{$player/image}" height="200"/> </div>
else ()
}
</div>
order by $name
return
<Placemark>
<name>{$name}</name>
<description>
{util:serialize($description,"method=xhtml")}
</description>
<Point>
<coordinates>{concat($long, ",",$lat,",0")}</coordinates>
</Point>
<styleUrl>#player</styleUrl>
</Placemark>
}
阿森納球員地圖
注意,q 引數是 URI 編碼的。
(: generate a sparql query on the dbpedia server
This takes a team name and generates a kml file showing the birth place of the players
:)
declare namespace r = "http://www.w3.org/2005/sparql-results#";
declare variable $query := "
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX : <http://dbpedia.org/resource/>
PREFIX p: <http://dbpedia.org/property/>
SELECT * WHERE {
?player p:currentclub <http://dbpedia.org/resource/Arsenal_F.C.>.
OPTIONAL {?player p:cityofbirth ?city}.
OPTIONAL {?player p:birth ?dob}.
OPTIONAL {?player p:clubnumber ?no}.
OPTIONAL {?player p:position ?position}.
OPTIONAL {?player p:image ?image}.
OPTIONAL {
{ ?city geo:long ?long. }
UNION
{ ?city p:redirect ?city2.
?city2 geo:long ?long.
}.
}.
OPTIONAL {
{ ?city geo:lat ?lat.}
UNION
{ ?city p:redirect ?city3.
?city3 geo:lat ?lat.
}.
}.
}
";
declare function local:execute-sparql($query as xs:string) {
let $sparql := concat("http://dbpedia.org/sparql?format=xml&default-graph-uri=http://dbpedia.org&query=",
encode-for-uri($query)
)
return doc($sparql)
};
declare function local:sparql-to-tuples($rdfxml ) {
for $result in $rdfxml//r:result
return
<tuple>
{ for $binding in $result/r:binding
return
if ($binding/r:uri)
then element {$binding/@name} {
attribute type {"uri"} ,
string($binding/r:uri)
}
else element {$binding/@name} {
attribute type {$binding/@datatype},
string($binding/r:literal)
}
}
</tuple>
};
declare function local:clean($text) {
let $text:= util:unescape-uri($text,"UTF-8")
let $text := replace($text,"http://dbpedia.org/resource/","")
let $text := replace($text,"\(.*\)","")
let $text := replace($text,"Football__positions#","")
let $text := replace($text,"#",",")
let $text := replace($text,"_"," ")
return $text
};
declare option exist:serialize "method=xhtml media-type=application/vnd.google-earth.kml+xml highlight-matches=none";
let $club := request:get-parameter ("club","Arsenal_F.C.")
let $queryx := replace($query,"Arsenal_F.C.",$club)
let $result:= local:execute-sparql($queryx)
let $tuples := local:sparql-to-tuples($result)
let $x := response:set-header('Content-disposition',concat('Content-disposition: inline;filename=',$club,'.kml;'))
return
<Document>
<name>Birthplaces of {local:clean($club)} players</name>
<Style id="player">
<IconStyle>
<Icon><href>http://maps.google.com/mapfiles/kml/pal2/icon49.png</href>
</Icon>
</IconStyle>
</Style>
{$result}
{
for $playername in distinct-values($tuples[lat]/player)
let $player := $tuples[player=$playername][lat][1]
let $name := local:clean($player/player)
let $city :=local:clean($player/city)
let $position := local:clean($player/position)
let $dob := if ($player/dob castable as xs:date) then datetime:format-date(xs:date($player/dob),"dd MMM, yyyy" ) else ""
let $no := if ($player/no castable as xs:integer) then concat(" [# ", xs:integer($player/no),"] ") else ""
let $lat := if ($player/lat castable as xs:decimal) then xs:decimal($player/lat) + (math:random() - 0.5)*0.01 else ""
let $long := if ($player/long castable as xs:decimal) then xs:decimal($player/long) + (math:random() -0.5)* 0.01 else ""
let $description :=
<div>
{concat ($position, $no, " born ", $dob, " in ", $city)}
<div><a href="{$player/player}">DBpedia</a>
<a href="http://images.google.co.uk/images?q={$name}">Google Images</a>
</div>
{if ($player/image !="") then <div><img src="{$player/image}" height="200"/> </div> else ()}
</div>
order by $name
return
<Placemark>
<name>{$name}</name>
<description>
{util:serialize($description,"method=xhtml")}
</description>
<Point>
<coordinates>{concat($long, ",",$lat,",0")}</coordinates>
</Point>
<styleUrl>#player</styleUrl>
</Placemark>
}
</Document>
我們還需要一個索引頁面,選擇英格蘭和蘇格蘭主要聯賽的所有俱樂部。這個指令碼遵循與上面更復雜的指令碼相同的思路,只是由於資料更簡單,原始 SPARQL 結果被直接使用,無需轉換。
索引按俱樂部名稱按字母順序排序,並提供指向球員地圖和基礎 DBpedia 資料的連結。
declare option exist:serialize "method=xhtml media-type=text/html";
declare namespace r = "http://www.w3.org/2005/sparql-results#";
declare variable $query := "
PREFIX : <http://dbpedia.org/resource/>
PREFIX p: <http://dbpedia.org/property/>
SELECT * WHERE {
?club p:league ?league.
{ ?club p:league :Premier_League.}
UNION
{?club p:league :Football_League_One.}
UNION
{?club p:league :Football_League_Two.}
UNION
{?club p:league :Scottish_Premier_League.}
UNION
{?club p:league :Football_League_Championship.}
}
";
declare function local:execute-sparql($query as xs:string) {
let $sparql := concat("http://dbpedia.org/sparql?format=xml&default-graph-uri=http://dbpedia.org&query=",escape-uri($query,true()) )
return doc($sparql)
};
declare function local:clean($string as xs:string) as xs:string {
let $string := util:unescape-uri($string,"UTF-8")
let $string := replace($string,"\(.*\)","")
let $string := replace($string,"_"," ")
return $string
};
<html>
<body>
<h1>England and Scottish Football Clubs</h1>
<table border="1">
{ for $tuple in local:execute-sparql($query)//r:result
let $club := $tuple/r:binding[@name="club"]/r:uri
let $club :=substring-after($club,"/resource/")
let $clubx := local:clean($club)
let $league := $tuple/r:binding[@name="league"]/r:uri
let $league := local:clean(substring-after($league,"/resource/"))
let $mapurl := concat("http://maps.google.co.uk/maps?q=",escape-uri(concat("http://www.cems.uwe.ac.uk/xmlwiki/RDF/club2kml.xq?club=",$club),true()))
order by $club
return
<tr>
<td>{$clubx}</td>
<td>{$league}</td>
<td><a href="{$mapurl}">Player Map</a></td>
<td><a href="http://dbpedia.org/resource/{$club}">DBpedia</a></td>
</tr>
}
</table>
</body>
</html>