ROSE 編譯器框架/autoPar
這是一個工具,可以自動將 OpenMP 指令插入輸入的序列 C/C++ 程式碼中。
該工具是使用 OpenMP 進行自動並行化的實現。它可以自動將 OpenMP 指令插入輸入的序列 C/C++ 程式碼中。對於具有現有 OpenMP 指令的輸入程式,當開啟正確選項時,該工具將雙重檢查正確性。
- 原始檔目前位於 rose/projects/autoParallelization。
- 一個獨立的可執行程式(名為 autoPar )被生成並安裝到 ROSE 的安裝樹中(位於 ROSE INS/bin 下)。
- 測試輸入檔案位於 rose/projects/autoParallelization/tests
- 您可以在 rose_build_tree/projects/autoParallelization 中透過鍵入 "make check" 來測試該工具。
- ROSE 手冊中有一節:12.7 自動並行化 pdf
- 它用於探索語義感知的自動並行化,如我們的論文中所述。
與 ROSE 類似,autoPar 是在 BSD 許可下發布的。
autoPar 作為 ROSE 編譯器的一部分發布。請按照以下步驟安裝 ROSE:
- 安裝
- 鍵入 make install-core -j8 來安裝核心 ROSE 庫和預構建工具。
在構建 ROSE 的過程中會建立一個可執行檔案 autoPar。使用者可以使用它來並行化他們的程式碼。autoPar 將安裝到由 –prefix=ROSE 安裝路徑 指定的路徑中。以下是部分說明。
在配置期間,您可能想要指定一個字首路徑,例如 –prefix=/home/youraccount/opt/roserev-8345。然後,按照 ROSE 安裝指南構建並安裝 ROSE,通常是鍵入 make 和 make install。
下一步是設定 ROSE 可執行檔案(包括 autoPar )和共享庫的搜尋路徑環境變數。假設您使用的是 bash,將以下幾行放入一個檔案(例如 set.rose)中並原始碼它(鍵入 source set.rose)
ROSE_INS=/home/youraccount/opt/rose-rev-8345 export ROSE_INS PATH=$ROSE_INS/bin:$PATH export PATH LD_LIBRARY_PATH=$ROSE_INS/lib:$LD_LIBRARY_PATH export LD_LIBRARY_PATH
現在您可以使用 autoPar 來並行化您的程式碼。假設您有一個順序檔案:file.c,只需鍵入 autoPar -c file.c 即可。將生成一個輸出檔案 rose file.c。如果 autoPar 可以找到任何可並行化的內容,輸出檔案應該包含生成的 OpenMP 指令。
如果您不想從頭開始安裝所有內容,可以下載 ROSE 的 vmware 映象,並直接使用其中安裝的 ROSE 副本。
有關此方面的更多資訊,請參見 ROSE_Compiler_Framework/虛擬機器映象。
在虛擬機器中,您可以 cd 到
/home/demo/buildrose/projects/autoParallelization
並鍵入
make check
您將看到 autoPar 被用來處理輸入檔案。
[~build64/projects/autoParallelization/tests]../autoPar—help
AUTOPAR(1) ROSE Command-line Tools AUTOPAR(1)
Name
autoPar - This tool automatically inserts OpenMP directives into sequential codes.
Synopsis
autoPar switches files...
Description
This tool is an implementation of automatic parallelization using OpenMP. It can automatically insert OpenMP directives into input serial C/C++
codes. For input programs with existing OpenMP directives, the tool will double check the correctness when requested.
Switches
General switches
--assert how
Determines how a failed assertion behaves. The choices are "abort", "exit" with a non-zero value, or "throw" a
Rose::Diagnostics::FailedAssertion exception. The default behavior depends on how ROSE was configured.
--help; -h
Show this documentation.
--log config
Configures diagnostics. Use "--log=help" and "--log=list" to get started.
--threads n
Number of threads to use for algorithms that support multi-threading. The default is 0. A value of zero means use the same number of
threads as there is hardware concurrency (or one thread if the hardware concurrency can't be determined).
--version; -V
Shows the dotted quad ROSE version and then exits. See also --version-long, which prints much more information.
--version-long
Shows version information for ROSE and various dependencies and then exits. The shorter --version switch shows only the dotted quad of the
ROSE library itself.
autoPar's switches
These switches control the autoPar tool.
--[rose:autopar:]annot string
Specify annotation file for semantics of abstractions
--[rose:autopar:]dumpannot
Dump annotation file content for debugging purposes.
--[rose:autopar:]enable_debug
Enable the debugging mode.
--[rose:autopar:]enable_diff
Compare user defined OpenMP pragmas to auto parallelization generated ones.
--[rose:autopar:]enable_distance
Report the absolute dependence distance of a dependence relation preventing parallelization.
--[rose:autopar:]enable_patch
Enable generating patch files for auto parallelization
--[rose:autopar:]failure_report string
Specify the report file for logging files autoPar cannot process
--[rose:autopar:]keep_going
Allow auto parallelization to keep going if errors happen
--[rose:autopar:]no_aliasing
Assuming no pointer aliasing exists.
--[rose:autopar:]success_report string
Specify the report file for logging files autoPar can process
--[rose:autopar:]unique_indirect_index
Assuming all arrays used as indirect indices have unique elements (no overlapping)
ROSE builtin switches
--rose:help
Show the old-style ROSE help.
0.9.9.123 Friday October 13 10:16:56 2017
其他有用的 rose 標誌
- -rose:skipfinalCompileStep // 跳過呼叫後端編譯器來編譯轉換後的程式碼,這有助於解決一些錯誤
- --edg:no_warnings // 抑制 EDG C++ 前端的警告
測試輸入檔案可以在 https://github.com/rose-compiler/rose-develop/tree/master/projects/autoParallelization/tests 找到。
相應的生成測試輸出檔案可以在:https://github.com/chunhualiao/autoPar-demo 找到。
我們在下面提供兩個示例。
輸入
/* Only the inner loop can be parallelized
*/
void foo()
{
int n=100, m=100;
double b[n][m];
int i,j;
for (i=0;i<n;i++)
for (j=0;j<m;j++)
b[i][j]=b[i-1][j-1];
}
命令列和螢幕輸出
../autoPar -rose:C99—edg:no_warnings -w -rose:verbose 0 --edg:restrict -rose:autopar:unique_indirect_index -rose:autopar:enable_patch -I../../../../sourcetree/src/frontend/SageIII -c ../../../../sourcetree/projects/autoParallelization/tests/inner_only.c
Enabling generating patch files for auto parallelization ... Assuming all arrays used as indirect indices have unique elements (no overlapping) ... Unparallelizable loop at line:8 due to the following dependencies: 2*2 TRUE_DEP; commonlevel = 2 CarryLevel = (0,0) Is precise SgPntrArrRefExp:b[i][j]@10:14->SgPntrArrRefExp:b[i - 1][j - 1]@10:21 == -1;* 0;||* 0;== -1;||:: Automatically parallelized a loop at line:9
輸出
/* Only the inner loop can be parallelized
*/
#include "omp.h"
void foo()
{
int n = 100;
int m = 100;
double b[n][m];
int i;
int j;
for (i = 0; i <= n - 1; i += 1) {
#pragma omp parallel for private (j) firstprivate (n,m,i)
for (j = 0; j <= m - 1; j += 1)
b[i][j] = b[i - 1][j - 1];
}
}
另一個輸入
int i,j;
int a[100][100];
void foo()
{
for (i=0;i<100;i++)
for (j=0;j<100;j++)
a[i][j]=a[i][j]+1;
}
輸出
#include "omp.h"
int i;
int j;
int a[100UL][100UL];
void foo()
{
#pragma omp parallel for private (i,j)
for (i = 0; i <= 100 - 1; i += 1) {
#pragma omp parallel for private (j) firstprivate (i)
for (j = 0; j <= 100 - 1; j += 1)
a[i][j] = (a[i][j] + 1);
}
}
註解檔案提供了額外的程式資訊,這些資訊通常難以被編譯器提取,例如別名、副作用資訊。
多個註解檔案可以在一個命令列中載入。
../autoPar --edg:no_warnings -w -rose:verbose 0 --edg:restrict -rose:autopar:unique_indirect_index -I/home/liao6/rose/freshmaster/sourcetree/src/frontend/SageIII -c -annot /home/liao6/rose/freshmaster/sourcetree/projects/autoParallelization/tests/floatArray.annot -annot /home/liao6/rose/freshmaster/sourcetree/projects/autoParallelization/tests/funcs.annot -annot /home/liao6/rose/freshmaster/sourcetree/projects/autoParallelization/tests/Index.annot /home/liao6/rose/freshmaster/sourcetree/projects/autoParallelization/tests/interp1_elem2.C
假設所有用作間接索引的陣列具有唯一元素(沒有重疊)...
自動並行化了第 13 行的迴圈。
輸入
//#include <A++.h>
#include "simpleA++.h"
void interpolate1D(class floatArray &fineGrid,class floatArray &coarseGrid)
{
int _var_0,i;
int _var_1;
int fineGridSize = fineGrid.length(0);
int coarseGridSize = coarseGrid.length(0);
// Interior fine points
class Range If(2,_var_1 = (fineGridSize - 2),2);
class Range Ic(1,(coarseGridSize - 1),1);
for (i = 1; i < _var_0; i += 1) {
fineGrid.elem(i) =fineGrid.elem(i)+1;
}
}
註解檔案
- https://github.com/rose-compiler/rose-develop/blob/master/projects/autoParallelization/tests/floatArray.annot
- https://github.com/rose-compiler/rose-develop/blob/master/projects/autoParallelization/tests/funcs.annot
輸出檔案
//#include <A++.h>
#include "simpleA++.h"
#include "omp.h"
void interpolate1D(class floatArray &fineGrid,class floatArray &coarseGrid)
{
int _var_0;
int i;
int _var_1;
int fineGridSize = fineGrid . length (0);
int coarseGridSize = coarseGrid . length (0);
// Interior fine points
class Range If(2,_var_1 = fineGridSize - 2,2);
class Range Ic(1,coarseGridSize - 1,1);
#pragma omp parallel for private (i) firstprivate (_var_0)
for (i = 1; i <= _var_0 - 1; i += 1) {
fineGrid . elem (i) = fineGrid . elem (i) + 1;
}
}
autoPar 接受一個選項 -rose:autopar:enable_patch 來生成一個補丁檔案。因此,使用者首先需要檢查要插入的指令。然後,使用者可以將檢查過的補丁應用到他們的應用程式中。
命令列
- autoPar -rose:C99 --edg:no_warnings -w -rose:verbose 0 --edg:restrict -rose:autopar:unique_indirect_index -rose:autopar:enable_patch -c jacobi_seq.c
螢幕輸出
Enabling generating patch files for auto parallelization ... Assuming all arrays used as indirect indices have unique elements (no overlapping) ... Automatically parallelized a loop at line:83 Unparallelizable loop at line:84 due to the following dependencies: 1*1 OUTPUT_DEP; commonlevel = 1 CarryLevel = (0,0) Is precise SgPntrArrRefExp:f[i][0]@89:6->SgPntrArrRefExp:f[i][0]@89:6 <= -1;||:: 1*1 OUTPUT_DEP; commonlevel = 1 CarryLevel = (0,0) Is precise SgPntrArrRefExp:f[i][0]@89:6->SgPntrArrRefExp:f[i][0]@89:6 <= -1;||:: Automatically parallelized a loop at line:143 Automatically parallelized a loop at line:144 Automatically parallelized a loop at line:147 Automatically parallelized a loop at line:148 Automatically parallelized a loop at line:185 Automatically parallelized a loop at line:186
生成的補丁檔案:cat jacobi_seq.c.patch
diff -ar /home/liao6/workspace/autoPar/sourcetree/projects/autoParallelization/tests/jacobi_seq.c rose_jacobi_seq.c 0a1 > #include <omp.h> 82a83 > #pragma omp parallel for private (xx,yy,i,j) firstprivate (n,m) 142a143 > #pragma omp parallel for private (i,j) 143a144 > #pragma omp parallel for private (j) 146a147 > #pragma omp parallel for private (resid,i,j) reduction (+:error) 147a148 > #pragma omp parallel for private (resid,j) reduction (+:error) firstprivate (omega,ax,ay,b) 184a185 > #pragma omp parallel for private (xx,yy,temp,i,j) reduction (+:error) 185a186 > #pragma omp parallel for private (xx,yy,temp,j) reduction (+:error) firstprivate (dx,dy)
檢查完補丁後,您可以將補丁應用到原始原始檔
- patch jacobi_seq.c -i jacobi_seq.c.patch -o jacobi_patched.c
AutoPar 檢查應用程式中預先存在的 OpenMP 指令,並驗證它們是否正確地考慮了私有、歸約和其他 OMP 資料共享屬性。
示例輸入
#include <stdio.h>
#include <omp.h>
int main(int argc, char *argv[]) {
int N = 20;
int total ;
int i,j;
#pragma omp parallel for
for (j=0;j<N;j++) {
for (i=0;i<N;i++) {
total += i+j ;
}
}
printf("%d\n", total) ;
return 0;
}
上面的程式碼包含一個真正的 OpenMP 錯誤,有人在嘗試向他們的程式碼新增 OMP 註解時遇到了困難,並在以下地址提交:http://openmp.org/forum/viewtopic.php?f=3&t=821
執行 autoPar OpenMP 指令檢查器會產生以下結果。
% autoPar -rose:unparse_tokens -rose:autopar:unique_indirect_index -rose:autopar:enable_diff -fopenmp -c OMPcheck.cc <<<<<<<<< user defined : #pragma omp parallel for -------- OMPcheck.cc:11 compiler generated:#pragma omp parallel for private (i) reduction (+:total) >>>>>>>>
來自 autoPar 的上述輸出表明,OMP 指令缺少變數 'i' 的 OpenMP 'private' 宣告,以及變數 'total' 的歸約註解。
autoPar 旨在處理對原始陣列進行操作的傳統迴圈以及使用高階抽象的現代應用程式。並行化器使用以下演算法:迴圈可能包含原始資料型別或 STL 容器型別的變數,或兩者兼而有之。
1. 準備和預處理
- (可選) 讀取已知抽象和語義的規範檔案。
- (可選) 基於輸入程式碼語義應用可選的自定義轉換,例如將樹遍歷轉換為記憶體池上的迴圈迭代。
- (可選) 如果假定唯一索引索引語義,則開啟間接陣列訪問的規範化(void uniformIndirectIndexedArrayRefs() )
- indirect_loop_index = array_Y [current_loop_index] ;array_X[indirect_loop_index] ..; 變為 array_X [array_Y [current_loop_index]] ..
- 規範化迴圈,包括使用迭代器的迴圈。
- 對規範化的程式碼應用常量摺疊以簡化程式碼。
- 查詢具有規範形式(用於 omp for)的候選陣列計算迴圈或對單個元素進行操作的迴圈和函式(用於 omp task)。
2. 對於每個候選者
- 如果存在沒有已知語義或副作用的函式呼叫,則跳過目標。
- 呼叫依賴關係分析和生存期分析。
- 對 OpenMP 變數進行分類(自動作用域)以使其成為私有變數、firstprivate 變數、lastprivate 變數、reduction 變數。識別對當前元素的引用,並找到與順序無關的寫入訪問。
- 消除與自動作用域變數相關的依賴關係,那些僅涉及當前元素的依賴關係,以及由與順序無關的寫入訪問引起的輸出依賴關係。
- 如果不存在依賴關係,則插入相應的 OpenMP 結構。
演算法的關鍵思想是捕獲目標內的依賴關係,並根據各種規則儘可能地消除它們。如果不存在剩餘的依賴關係,則並行化是安全的。
在第 51 頁或(59/320):第 2.6 節,OpenMP 4.0(http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf)中定義為
for (init-expr; test-expr; incr-expr) structured-block
init-expr 以下之一
var = lb integer-type var = lb random-access-iterator-type var = lb pointer-type var = lb
test-expr 以下之一
var relational-op b b relational-op var
relational-op 以下之一
< <= > >=
incr-expr 以下之一
++var var++ --var var-- var += incr var -= incr var = var + incr var = incr + var var = var - incr
var 以下之一
A variable of a signed or unsigned integer type. For C++, a variable of a random access iterator type. For C, a variable of a pointer type.
If this variable would otherwise be shared, it is implicitly made private in the loop construct. This variable must not be modified during the execution of the for-loop other than in incr-expr. Unless the variable is specified lastprivate on the loop construct, its value after the loop is unspecified.
lb 和 b:與 var 型別相容的迴圈不變表示式。
incr:迴圈不變整數表示式。
依賴關係分析是並行化器決定迴圈是否可並行化的基礎。autoPar 從迴圈最佳化器呼叫依賴關係分析,該最佳化器實現 [35, 36] 中提出的演算法以有效地轉換完全巢狀迴圈和非完全巢狀迴圈。擴充套件方向矩陣 (EDM) 依賴關係表示用於覆蓋僅圍繞兩個語句之一的非通用迴圈巢狀,以處理非完全巢狀迴圈。對於迴圈內的陣列訪問,使用高斯消元演算法來求解一組迴圈歸納變數的線性整數方程。
此步驟依賴於在以下位置實現的依賴關係分析
- src/midend/programTransformation/loopProcessing/depGraph
在以下位置提供了簡化的介面函式
//Compute dependence graph for a loop, using ArrayInterface and ArrayAnnoation LoopTreeDepGraph* ComputeDependenceGraph(SgNode* loop, ArrayInterface*, ArrayAnnotation* annot);
依賴關係分析依賴於從副作用分析收集的變數引用資訊。
分析的原始碼標頭檔案為 midend/programTransformation/loopProcessing/depInfo/DepInfoAnal.h
- bool AnalyzeStmtRefs( AstInterface& fa, const AstNodePtr& n, CollectObject<AstNodePtr> &wRefs, CollectObject<AstNodePtr> &rRefs);
提供了一些 SageInterface 函式來使開發人員更容易地使用此分析。
// Collect all read and write references within stmt, which can be a function, a scope statement, or a single statement. Note that a reference can be both read and written, like i++. bool SageInterface::collectReadWriteRefs (SgStatement *stmt, std::vector< SgNode * > &readRefs, std::vector< SgNode * > &writeRefs) //Collect unique variables which are read or written within a statement. Note that a variable can be both read and written. The statement can be either of a function, a scope, or a single line statement. bool SageInterface::collectReadWriteVariables (SgStatement *stmt, std::set< SgInitializedName * > &readVars, std::set< SgInitializedName * > &writeVars) // Collect read only variables within a statement. The statement can be either of a function, a scope, or a single line statement. void SageInterface::collectReadOnlyVariables (SgStatement *stmt, std::set< SgInitializedName * > &readOnlyVars) //Collect read only variable symbols within a statement. The statement can be either of a function, a scope, or a single line statement. void SageInterface::collectReadOnlySymbols (SgStatement *stmt, std::set< SgVariableSymbol * > &readOnlySymbols)
此步驟根據變數的 live-in(在執行迴圈之前)和 live-out(在執行迴圈之後)分析結果,對變數的資料共享屬性進行分類。例如,迴圈內的私有變數既不是迴圈的 live-in,也不是 live-out,這意味著該變數在迴圈內立即被殺死(重新定義),然後在迴圈內以某種方式使用,但在迴圈之後在任何地方都不會被使用。所有迴圈索引變數也被分類為 OpenMP 私有變數以避免可能的競態條件。另一方面,共享變數在迴圈開始和結束時都是活動的。Firstprivate 和 lastprivate 變數分別僅在迴圈開始或僅在迴圈結束時處於活動狀態。
reduction 變數的處理方式有所不同,以最大限度地提高並行化的機會。迴圈內的典型 reduction 操作,例如 sum = sum + a[i],會導致迴圈攜帶的輸出依賴關係、迴圈攜帶的反依賴關係和迴圈獨立的反依賴關係。我們使用一個習語識別分析來捕獲此類典型操作,並在決定迴圈是否可並行化時排除相關的迴圈攜帶依賴關係。
此步驟依賴於在以下位置實現的生存期分析
- src/midend/programAnalysis/defUseAnaysis/LivenessAnalysis.cpp
- 存在一個 SageInterface 函式來呼叫此分析
//!Get liveIn and liveOut variables for a for loop void SageInterface::getLiveVariables(LivenessAnalysis * liv, SgForStatement* loop, std::set<SgInitializedName*>& liveIns, std::set<SgIn itializedName*> & liveOuts)
演算法:私有變數和 reduction 變數會導致依賴關係(被寫入)。firstprivate 和 lastprivate 變數從未在迴圈中被寫入(沒有依賴關係)
live-in live-out shared Y Y no written, no dependences: no special handling, shared by default private N N written (depVars), need privatization: depVars- liveIns - liveOuts firstprivate Y N liveIns - LiveOus - writtenVariables lastprivate N Y liveOuts - LiveIns reduction Y Y depVars Intersection (liveIns Intersection liveOuts)
要識別 reduction 變數:RecognizeReduction()
- 從 https://github.com/rose-compiler/rose-develop/blob/master/src/frontend/SageIII/sageInterface/sageInterface.C 中實現為 void SageInterface::ReductionRecognition(SgForStatement* loop, std::set< std::pair <SgInitializedName*, VariantT> > & results)
/* * Algorithms: * for each scalar candidate which are both live-in and live-out for the loop body * and which is not the loop invariant variable. * Consider those with only 1 or 2 references * 1 reference * the operation is one of x++, ++x, x--, --x, x binop= expr * 2 references belonging to the same operation * operations: one of x= x op expr, x = expr op x (except for subtraction) * Also according to the specification. * x is not referenced in exp * expr has scalar type (no array, objects etc) * x: scalar only, aggregate types (including arrays), pointer types and reference types may not appear in a reduction clause. * op is not an overloaded operator, but +, *, -, &, ^ ,|, &&, || * binop is not an overloaded operator but: +, *, -, &, ^ ,| * */
我們使用一個單獨的檔案來儲存輸入程式碼的註釋。該檔案本質上指定了輸入程式碼的語義,以便工具可以利用它來並行化程式碼。
一些註釋檔案 (*.annot) 可以在專案目錄中的 https://github.com/rose-compiler/rose-develop/tree/master/projects/autoParallelization/tests 中找到。
- https://github.com/rose-compiler/rose/blob/master/projects/autoParallelization/annot/Cmath.annot 156 個 C 標準庫函式
- https://github.com/rose-compiler/rose-develop/blob/master/projects/autoParallelization/tests/floatArray.annot 陣列語義
- https://github.com/rose-compiler/rose-develop/blob/master/projects/autoParallelization/tests/std_vector.annot std::vector 語義
- https://github.com/rose-compiler/rose-develop/blob/master/projects/autoParallelization/tests/funcs.annot 使用者函式
- https://github.com/rose-compiler/rose-develop/blob/master/projects/autoParallelization/tests/Index.annot
示例註釋為
operator atoll(const char * val)
{
modify none; read{val}; alias none;
}
class InternalIndex { has_value { stride = this.stride; base = this.base; length = this.length; } }
class Index { has_value { stride = this.stride; base = this.base; length = this.length; } }
class Range { has_value { stride = this.stride; base = this.base; length = this.length; } }
operator Index::Index( int lb, int l, int step)
{
modify none; read {lb, l, step}; alias none;
restrict_value { result = { base = lb; length = l; stride = step; } };
}
operator Range::Range( int lb, int ub, int step)
{
modify none; read {lb, ub, step}; alias none;
restrict_value { result = { base = lb; length = (ub-lb+1)/step; stride = step; } };
}
operator InternalIndex::InternalIndex( int lb, int l, int step)
{
modify none; read {lb, l, step}; alias none;
restrict_value { result = { base = lb; length = l; stride = step; } };
}
operator operator+ ( const InternalIndex & lhs, int x )
{
modify none; read {lhs, x}; alias none;
restrict_value { result = {stride = lhs.stride; base = lhs.base + x;
length = lhs.length}; };
}
operator operator- ( const InternalIndex & lhs, int x )
{
modify none; read {lhs, x}; alias none;
restrict_value { result = {stride = lhs.stride; base = lhs.base - x;
length = lhs.length}; };
}
陣列抽象
class floatArray {
array {
dimension = 6;
length(i) = this.Array_Descriptor.Array_Domain.Size[i];
elem(i:dim:1:dimension) = this(i$dim);
reshape(i:dim:1:dimension) = this.resize(i$dim);
};
array_optimize {
define {
float* _pointer = this.getDataPointer();
int _size:dim:1:dimension = this.Array_Descriptor.Array_Domain.Size[dim-1];
int _stride:dim:1:dimension = this.Array_Descriptor.Array_Domain.Stride[dim-1];
int _length:dim:1:dimension = this.Array_Descriptor.Array_Domain.getLength(dim-1)
}
length(i) = _length$(i+1);
elem(i:dim:1:dimension) =
_pointer[i$1 + repeat(x,2,dimension, i$x * _stride$(x-1) * _size$(x-1))];
};
has_value { dimension = 6; length_0 = this.length(0);
length_1 = this.length(1); length_2 = this.length(2); };
}
operator floatArray::operator() (int index)
{
modify none; read {index};
alias none;
restrict_value { this = { dimension = 1; } };
inline { this.elem(index) };
}
operator floatArray::operator() ( const InternalIndex& index)
{
modify none; read {this, index};
alias { (result, this) };
restrict_value
{ this = { dimension = 1; };
result = {dimension = 1; length_0 = index.length;}; };
construct_array (this) {
dimension = 1;
length(0) = index.length;
elem(i) = this.elem(i * index.stride + index.base);
};
}
operator floatArray::operator()(const InternalIndex& index1,
const InternalIndex& index2)
{
modify none; read {this, index1, index2};
alias { (result, this) };
restrict_value
{ this = { dimension = 2; };
result = {dimension = 2; length_0 = index1.length;
length_1 = index2.length};
};
construct_array (this) {
dimension = 2;
length(0) = index1.length; length(1) = index2.length;
elem(i,j) = this.elem(i * index1.stride + index1.base,
j * index2.stride + index2.base);
};
}
operator floatArray::operator()(const InternalIndex& index1,
const InternalIndex& index2,
const InternalIndex& index3)
{
modify none; read {this, index1, index2, index3};
alias { (result, this) };
restrict_value
{ this = { dimension = 3; };
result = {dimension = 3; length_0 = index1.length;
length_1 = index2.length; length_2 = index3.length; };
};
construct_array (this) {
dimension = 3;
length(0) = index1.length; length(1) = index2.length;
length(2) = index3.length;
elem(i,j,k) = this.elem(i * index1.stride + index1.base,
j * index2.stride + index2.base,
k * index3.stride + index3.base);
};
}
operator floatArray::operator= (const floatArray& that)
{
modify {this}; read {that}; alias none;
restrict_value {
result = { dimension = that.dimension; length_0 = that.length_0;};
};
modify_array (this) {
dimension = that.dimension;
length(i) = that.length(i);
elem(i:dim:1:dimension) = that.elem(i$dim);
};
}
operator floatArray::operator= (float val)
{
modify {this}; read {val}; alias none;
modify_array (this) {
elem(i:dim:1:dimension) = val;
};
}
operator floatArray::getLength( int dim)
{
modify none; read {dim}; alias none;
inline { this.length(dim) };
}
並非所有依賴關係都會阻止我們並行化迴圈。在我們的考慮中,我們排除了某些無害的依賴關係。
void DependenceElimination(SgNode* sg_node, LoopTreeDepGraph* depgraph, std::vector<DepInfo>& remainings, OmpSupport::OmpAttribute* att, std::map<SgNode*, bool> & indirect_table, ArrayInterface* array_interface/*=0*/, ArrayAnnotation* annot/*=0*/)
- 來自檔案 https://github.com/rose-compiler/rose/blob/master/projects/autoParallelization/autoParSupport.C
演算法,消除以下依賴關係
- 由區域性宣告的變數引起:已經是每個迭代的私有變數
- 源變數或匯變數是執行緒區域性變數
- 忽略具有空源/匯節點的依賴關係
- commonlevel ==0,沒有共同的封閉迴圈
- carry level !=0,迴圈獨立,
- 依賴關係中至少有一個數組引用,但依賴關係型別為 SCALAR_DEP 或 SCALAR_BACK_DEP
- 由自動作用域變數引起的依賴關係(private、firstprivate、lastprivate、reduction 等)
- 由一對間接索引陣列引用引起的依賴關係,如果我們知道間接索引陣列具有唯一的元素。
請注意,為什麼 carry level 必須為 0 才能防止我們的分析中的並行化
- 這是有效的,因為我們在迴圈的每一級上執行 dep 分析。
- 為了考慮當前迴圈,僅由當前迴圈攜帶的依賴關係很重要。
- 當前迴圈始終是最外層迴圈,carry level id 為 0。
autoPar -rose:autopar:enable_debug sourcetree/projects/autoParallelization/tests/jacobi_seq.c
搜尋“此 dep 關係無法消除。儲存到剩餘的依賴關係集中。”以查詢具有無法消除的迴圈攜帶依賴關係的迴圈。
// Input code from Jacobi, expected to generate
// #pragma omp parallel for private(i,j,xx,yy,temp) reduction(+:error)
185 for (i = 0; i < n; i++)
186 for (j = 0; j < m; j++)
187 {
188 xx = -1.0 + dx * (i - 1);
189 yy = -1.0 + dy * (j - 1);
190 temp = u[i][j] - (1.0 - xx * xx) * (1.0 - yy * yy);
191 error = error + temp * temp;
192 }
Debug: ComputeDependenceGraph() dumps the dependence graph for the loop at line :185
dep SgExprStatement:xx = - 1.0 + dx *(i - 1); SgExprStatement:xx = - 1.0 + dx *(i - 1); 0x7fffb5d03db0 Distance Matrix size:2*2 SCALAR_DEP; commonlevel = 2 CarryLevel = (2,2) Not precise
SgVarRefExp:xx@188:2->
SgVarRefExp:xx@188:2
== 0;* 0;||* 0;== 0;||::
dep SgExprStatement:xx = - 1.0 + dx *(i - 1); SgExprStatement:temp = u[i][j] -(1.0 - xx * xx) *(1.0 - yy * yy); 0x7fffb5d03db0 Distance Matrix size:2*2 SCALAR_DEP; commonlevel = 2 CarryLevel = (0,0) Not precise
SgVarRefExp:xx@188:2->
SgPntrArrRefExp:u[i][j]@190:13
* 0;* 0;||* 0;* 0;||::
dep SgExprStatement:xx = - 1.0 + dx *(i - 1); SgExprStatement:temp = u[i][j] -(1.0 - xx * xx) *(1.0 - yy * yy); 0x7fffb5d03db0 Distance Matrix size:2*2 SCALAR_DEP; commonlevel = 2 CarryLevel = (2,2) Not precise
SgVarRefExp:xx@188:2->
SgVarRefExp:xx@190:26
== 0;* 0;||* 0;== 0;||::
dep SgExprStatement:xx = - 1.0 + dx *(i - 1); SgExprStatement:temp = u[i][j] -(1.0 - xx * xx) *(1.0 - yy * yy); 0x7fffb5d03db0 Distance Matrix size:2*2 SCALAR_DEP; commonlevel = 2 CarryLevel = (2,2) Not precise
SgVarRefExp:xx@188:2->
SgVarRefExp:xx@190:31
== 0;* 0;||* 0;== 0;||::
..
dep SgExprStatement:error = error + temp * temp; SgExprStatement:error = error + temp * temp; 0x7fffb5d03db0 Distance Matrix size:2*2 SCALAR_BACK_DEP; commonlevel = 2 CarryLevel = (1,1) Not precise
SgVarRefExp:error@191:2->
SgVarRefExp:error@191:10
== 0;* 0;||* 0;<= -1;||::
...
dep SgExprStatement:error = error + temp * temp; SgExprStatement:temp = u[i][j] -(1.0 - xx * xx) *(1.0 - yy * yy); 0x7fffb5d03db0 Distance Matrix size:2*2 SCALAR_BACK_DEP; commonlevel = 2 CarryLevel = (1,1) Not precise
SgVarRefExp:temp@191:25->
SgVarRefExp:temp@190:2
== 0;* 0;||* 0;<= -1;||::
dep SgExprStatement:error = error + temp * temp; SgExprStatement:temp = u[i][j] -(1.0 - xx * xx) *(1.0 - yy * yy); 0x7fffb5d03db0 Distance Matrix size:2*2 SCALAR_BACK_DEP; commonlevel = 2 CarryLevel = (0,0) Not precise
SgVarRefExp:error@191:2->
SgPntrArrRefExp:u[i][j]@190:13
* 0;* 0;||* 0;* 0;||::
--------------------------------------------------------
Live-in variables for loop:186
error
::m
::n
Live-out variables for loop before line:193
error
::m
::n
Final Live-in variables for loop:
error
::m
::n
Final Live-out variables for loop:
error
::m
::n
Debug after CollectVisibleVaribles ():
0x7f1640ef0678 ::n
0x7f1640ef07c0 ::m
0x7f1640ef1200 ::dx
0x7f1640ef1348 ::dy
0x7f1640ef2680 j
0x7f1640ef27c8 xx
0x7f1640ef2910 yy
0x7f1640ef2a58 temp
0x7f1640ef2ba0 error
Debug after CollectVariablesWithDependence():
0x7f1640ef27c8 xx
0x7f1640ef2910 yy
0x7f1640ef2a58 temp
0x7f1640ef2ba0 error
Debug dump private:
0x7f1640ef27c8 xx
0x7f1640ef2910 yy
0x7f1640ef2a58 temp
0x7f1640ef2538 i
0x7f1640ef2680 j
Debug dump lastprivate:
Debug: A candidate used twice:error
Debug dump firstprivate:
Entering DependenceElimination ()
----------------------------------
Eliminating a dep relation due to at least one autoscoped variables
0x7fffb5d03f80 Distance Matrix size:2*2 SCALAR_DEP; commonlevel = 2 CarryLevel = (2,2) Not precise
SgVarRefExp:xx@188:2->
SgVarRefExp:xx@188:2
== 0;* 0;||* 0;== 0;||::
Eliminating a dep relation due to scalar dep type for at least one array variable
0x7fffb5d03f80 Distance Matrix size:2*2 SCALAR_DEP; commonlevel = 2 CarryLevel = (0,0) Not precise
SgVarRefExp:xx@188:2->
SgPntrArrRefExp:u[i][j]@190:13
* 0;* 0;||* 0;* 0;||::
Eliminating a dep relation due to at least one autoscoped variables
0x7fffb5d03f80 Distance Matrix size:2*2 SCALAR_DEP; commonlevel = 2 CarryLevel = (2,2) Not precise
SgVarRefExp:xx@188:2->
SgVarRefExp:xx@190:26
== 0;* 0;||* 0;== 0;||::
Eliminating a dep relation due to scalar dep type for at least one array variable
0x7fffb5d03f80 Distance Matrix size:2*2 SCALAR_DEP; commonlevel = 2 CarryLevel = (0,0) Not precise
SgVarRefExp:yy@189:2->
SgPntrArrRefExp:u[i][j]@190:13
* 0;* 0;||* 0;* 0;||::
..
Eliminating a dep relation due to at least one autoscoped variables
0x7fffb5d03f80 Distance Matrix size:2*2 SCALAR_BACK_DEP; commonlevel = 2 CarryLevel = (1,1) Not precise
SgVarRefExp:yy@190:44->
SgVarRefExp:yy@189:2
== 0;* 0;||* 0;<= -1;||::
...
Eliminating a dep relation due to scalar dep type for at least one array variable
0x7fffb5d03f80 Distance Matrix size:2*2 SCALAR_DEP; commonlevel = 2 CarryLevel = (0,0) Not precise
SgPntrArrRefExp:u[i][j]@190:13->
SgVarRefExp:error@191:2
* 0;* 0;||* 0;* 0;||::
...
Eliminating a dep relation due to at least one autoscoped variables
0x7fffb5d03f80 Distance Matrix size:2*2 SCALAR_BACK_DEP; commonlevel = 2 CarryLevel = (1,1) Not precise
SgVarRefExp:error@191:2->
SgVarRefExp:error@191:10
== 0;* 0;||* 0;<= -1;||::
Eliminating a dep relation due to at least one autoscoped variables
0x7fffb5d03f80 Distance Matrix size:2*2 SCALAR_BACK_DEP; commonlevel = 2 CarryLevel = (1,1) Not precise
SgVarRefExp:temp@191:18->
SgVarRefExp:temp@190:2
== 0;* 0;||* 0;<= -1;||::
Exiting DependenceElimination ()
----------------------------------
Automatically parallelized a loop at line:185
debug patch generation: attaching OMP att to sg_node SgForStatement at line 185
用於編碼型別(類)及其操作(運算子)的語義資訊
原始檔:https://github.com/rose-compiler/rose-develop/tree/master/src/midend/astUtil/annotation
// annotations are separated by ;
<annot> ::= <annot1>
| <annot1>;<annot>
// two main categories: class and operator
<annot1> ::= class <cls annot>
| operator <op annot>
//---------class------------
// If this is an array-like class?
// e.g: class MyArray:
<cls annot> ::= <clsname>:<cls annot1>;
<cls annot1>::= <cls annot2>
| <cls annot2> <cls annot1>
//from bool TypeDescriptor:: read(istream& in) in AnnotDescriptors.C
// type name can have const modifier, with qualifiers, and of reference or pointer type
<clasname>::= [const] name1[::name2[...]][&|*]
// array information, inheritable or not
<cls annot2>::= <arr annot>
| inheritable <arr annot>
| has-value { <val def> }
//Array information: dimension, length, element, reshape
<arr annot>::= is-array{ <arr def>}
| is-array{define{<stmts>}<arr def>} // define loop-invariant operations that should be executed before enumerating array elements
<arr def> ::= <arr attr def>
| <arr attr def> <arr def>
<arr attr def> ::= <arr attr>=<expression>;
<arr attr> ::= dim // at most 'dim' dimensions
| len (<param>) // length of each dimension: parameter is used to specify which dimension
| elem(<paramlist>) //i$x:0:dim-1 ==> $x is from 0 to dim-1 ==> list of parameters i_0, i_1, i_dim-1
| reshape(<paramlist>)
//Data members for types/classes
//Could have expressions
<val def> ::= <name>;
| <name>;<val def>
| <name> = <expression> ;
| <name> = <expression> ; <val def>
//---------operator------------
// side effects: mod, read, alias
<op annot> ::= <opdecl> : <op annot1> ;
<op annot1> ::=<op annot2>
| <op annot2> <op annot1>
<op annot2> ::= modify <namelist> // side effects on scalar:e.g. modify none
| new-array (<aliaslist>){<arr def>} // side effects on arrays: creational
| modify-array (<name>) {<arr def>} // side effects on arrays: modifying only
| restrict-value {<val def list>} // How to get member values from this operation's semantics( may be from parameters values)
| read <namelist> // e.g.: read {a,b,c}
| alias <nameGrouplist> // e.g.: alias none
| allow-alias <nameGrouplist>
| inline <expression> // this operator is semantically equal to another operator/expression, used to substitute an operation with another
// from OperatorDescriptors.h NameGropu, OperatorAliasDescriptor
<nameGrouplist> e.g: {(x,y,z), (a,b), (m,n)}
陣列和陣列最佳化(重寫為原始陣列)
class floatArray {
array { // high level array abstraction with range from 1 to dimension
dimension = 6; // max dimension
length(i) = this.Array_Descriptor.Array_Domain.Size[i]; // length of each dimension
//i:dim:1:dimension/ i$dim means a list of parameters i_1, i_2, .., i_dimension?
// i:dim:1:dimension also defines a range variable dim, which is reused later in i$dim
elem(i:dim:1:dimension) = this(i$dim);
reshape(i:dim:1:dimension) = this.resize(i$dim);
};
array_optimize { // low level array with range from 0 to dimension-1
define {
float* _pointer = this.getDataPointer();
int _size:dim:1:dimension = this.Array_Descriptor.Array_Domain.Size[dim-1];
int _stride:dim:1:dimension = this.Array_Descriptor.Array_Domain.Stride[dim-1];
int _length:dim:1:dimension = this.Array_Descriptor.Array_Domain.getLength(dim-1)
}
length(i) = _length$(i+1);
elem(i:dim:1:dimension) =
_pointer[i$1 + repeat(x,2,dimension, i$x * _stride$(x-1) * _size$(x-1))];
};
has_value { dimension = 6; length_0 = this.length(0);
length_1 = this.length(1); length_2 = this.length(2); };
}
dump ()--------------
arrays:
floatArray : {dimension=6; length=((i:::):[](.(.(.(this,Array_Descriptor),Array_Domain),Size),i)); elem=((i:dim:1:dimension):this($(i,dim))) } reshape = ((i:dim:1:dimension):FunctionPtrCall(.(this,resize),$(i,dim)))
array optimizations:
floatArray :
{float* _pointer:::=FunctionPtrCall(.(this,getDataPointer));
int _size:dim:1:dimension=[](.(.(.(this,Array_Descriptor),Array_Domain),Size),+(-1dim));
int _stride:dim:1:dimension=[](.(.(.(this,Array_Descriptor),Array_Domain),Stride),+(-1dim));
int _length:dim:1:dimension=FunctionPtrCall(.(.(.(this,Array_Descriptor),Array_Domain),getLength),+(-1dim))
}
{dimension=; length=((i:::):$(_length,+(1i))); elem=((i:dim:1:dimension):[](_pointer,+($(i,1)repeat(x,2,dimension,*($(i,x)$(_stride,+(-1x))$(_size,+(-1x)))))))
}
函式副作用註釋:讀、修改和別名的變數集
operator VectorXY::VectorXY(double xx, double yy)
{
modify none; read {xx,yy};
}
// side effects annotation
operator floatArray::operator() ( const InternalIndex& index)
{
modify none; read {this, index};
}
// Dump(): mangled_function_name: (all variables): {modified variable }
floatArray::operator()_constInternalIndex& : (this,index):{}
// another annotation, side effects, alias, value
operator Index::Index( int lb, int l, int step)
{
modify none; read {lb, l, step}; alias none;
restrict_value { result = { base = lb; length = l; stride = step; } };
}
// Dump()
read:
Index::Index_int_int_int : (lb,l,step):{l,lb,step}
// allow_alias: used only for a function operating on two or more parameters? tell if parameters can be alias to each other?
operator interpolate2D(floatArray & fineGrid,floatArray & coarseGrid)
{
allow_alias none;
}
has_value 和 restrict _value 註釋:“this” 和 “result” 是什麼?
// has_value: Only used for within class annotation,
class floatArray {
..
has_value { dimension = 6; length_0 = this.length(0);
length_1 = this.length(1); length_2 = this.length(2); };
class Range {
has_value { stride = this.stride; base = this.base; length = this.length; }
}
//Dump:
floatArray : {dimension:6;length_0:FunctionPtrCall(.(this,length),0);length_1:FunctionPtrCall(.(this,length),1);length_2:FunctionPtrCall(.(this,length),2)}
}
Range : {base:.(this,base);length:.(this,length);stride:.(this,stride)}
// restrict_value: infer the value of member variable from the semantics of current operations: used for operation annotations. could have inferred information for one "this" and multiple "result" depending on function parameters
operator floatArray::operator() (int index)
{ ..
restrict_value { this = { dimension = 1; } }; // this element access operator using one integer parameter can tell us that the current object (this) has a dimension value equaling to 1
}
operator floatArray::operator()(const InternalIndex& index1,
const InternalIndex& index2,
const InternalIndex& index3)
{
restrict_value
{ this = { dimension = 3; };
result = {dimension = 3; length_0 = index1.length;
length_1 = index2.length; length_2 = index3.length; };
};
}
operator sqrt( double val)
{
modify none; read{val}; alias none;
}
operator sqrt( float val)
{
modify none; read{val}; alias none;
}
operator abs(int val)
{
modify none; read{val}; alias none;
}
- 舊技術報告:廖春華、丹尼爾·J·昆蘭、傑里米·J·威爾科克和托馬斯·帕納斯。基於 STL 語義的 OpenMP 自動並行化。技術報告,勞倫斯利弗莫爾國家實驗室 (LLNL),利弗莫爾,加利福尼亞州(美國),2008 年。
- 一篇研討會論文:Chunhua Liao、Daniel J. Quinlan、Jeremiah J. Willcock 和 Thomas Panas,擴充套件自動並行化以最佳化多核的高階抽象,2009 年 5 月國際 OpenMP 研討會論文集:在極端並行時代發展 OpenMP(德國德累斯頓,2009 年 6 月 3 日至 5 日)。pdf
- 論文的期刊版本:Chunhua Liao、Daniel J. Quinlan、Jeremiah J. Willcock 和 Thomas Panas,使用高階抽象進行現代應用程式的語義感知自動並行化,並行程式設計雜誌,2010 年 1 月接受 pdf