ROSE 編譯器框架/大綱

概述

基本概念：大綱化是將一段連續的語句替換為對包含這些語句的新函式的函式呼叫。從概念上講，大綱化是內聯的逆過程。

用途：大綱化廣泛用於生成要在 CPU 和/或 GPU 上執行的核心函式。

幫助實現 OpenMP 等程式設計模型
支援透過首先從程式碼部分生成函式來對程式碼部分進行經驗性調整。

ROSE 提供了一個名為 AST 大綱化的內建翻譯器，它可以概述指定的部分程式碼並從中生成函式。

AST 大綱化的官方文件位於 ROSE 教程的第 37 章使用 AST 大綱化。 pdf.

使用大綱化器主要有兩種方法。

命令列方法：可以使用命令（outline ）帶選項來指定大綱目標，有兩種方法來指定要概述的程式碼部分
- 在輸入程式中使用特殊的編譯指示來標記大綱目標，然後呼叫高階驅動程式例程來處理這些編譯指示。
- 在命令列中使用抽象控制代碼字串（在 ROSE 教程的第 46 章中詳細介紹）
函式呼叫方法：呼叫“低階”大綱例程，這些例程直接在要概述的 AST 節點上操作

安裝

請遵循 https://github.com/rose-compiler/rose/wiki/How-to-Set-Up-ROSE 中的說明

如果您是從原始碼安裝的，請檢視 https://github.com/rose-compiler/rose/wiki/Install-Rose-From-Source

要僅安裝該工具，請鍵入

make install -C tests/nonsmoke/functional/roseTests/astOutliningTests

大綱工具將安裝為

ROSE_INST/bin/outline

命令列

工具 rose/bin/outline 依賴於 1) 輸入程式碼中的編譯指示或 2) 作為命令列選項指定的抽象控制代碼來查詢要概述的目的碼部分。

編譯指示：將 #pragam rose_outline 放在要概述的程式碼部分的前面，在輸入程式碼中
抽象控制代碼：-rose:outline:abstract_handle your_handle_string

選項

./outline --help | more

Outliner-specific options
Usage: outline [OPTION]... FILENAME...
Main operation mode:
        -rose:outline:preproc-only                     preprocessing only, no actual outlining
        -rose:outline:abstract_handle handle_string    using an abstract handle to specify an outlining target
        -rose:outline:parameter_wrapper                use an array of pointers to pack the variables to be passed
        -rose:outline:structure_wrapper                use a data structure to pack the variables to be passed
        -rose:outline:enable_classic                   use parameters directly in the outlined function body without transferring statement, C only
        -rose:outline:temp_variable                    use temp variables to reduce pointer dereferencing for the variables to be passed
        -rose:outline:enable_liveness                  use liveness analysis to reduce restoring statements if temp_variable is turned on
        -rose:outline:new_file                         use a new source file for the generated outlined function
        -rose:outline:output_path                      the path to store newly generated files for outlined functions, if requested by new_file. The original source file's path is used by default.
        -rose:outline:exclude_headers                  do not include any headers in the new file for outlined functions
        -rose:outline:use_dlopen                       use dlopen() to find the outlined functions saved in new files.It will turn on new_file and parameter_wrapper flags internally
        -rose:outline:copy_orig_file                   used with dlopen(): single lib source file copied from the entire original input file. All generated outlined functions are appended to the lib source file
        -rose:outline:enable_debug                     run outliner in a debugging mode
        -rose:outline:select_omp_loop                  select OpenMP for loops for outlining, used for testing purpose

使用示例

outline test.cpp // 概述 test.cpp 中的程式碼部分。這些程式碼部分由特殊的 rose_outline 編譯指示標記
outline -rose:skipfinalCompileStep -rose:outline:new_file test.cpp // 跳過編譯生成的 rose_? 檔案，將生成的函式放入新檔案

在命令列中使用抽象控制代碼，不再需要在輸入程式碼中插入編譯指示

outline -rose:outline:abstract_handle ”ForStatement<position,12>” test3.cpp // 概述 test3.cpp 第 12 行的 for 迴圈
outline -rose:outline:abstract_handle ”FunctionDeclaration<name,initialize>::ForStatement<numbering,2>” test2.cpp // 概述 test2.cpp 檔案中名為“initialize”的函式內的第 2 個 for 迴圈。

/home/liao6/workspace/masterDevClean/buildtree/tests/roseTests/astOutliningTests/outline -rose:outline:new_file -rose:outline:temp_variable -rose:outline:exclude_headers -rose:outline:abstract_handle 'ForStatement<numbering,1>' -c /home/liao6/workspace/masterDevClean/sourcetree/tests/roseTests/astOutliningTests/complexStruct.c

程式設計 API

您可以構建自己的翻譯器，利用 ROSE 中的大綱支援。程式設計 API 定義在

標頭檔案：src/midend/programTransformation/astOutlining/
名稱空間：Outliner

提供了一些函式和選項

函式：Outliner::outline()、Outliner::isOutlineable()
選項

內部控制變數

Outliner.cc

namespace Outliner {
  //! A set of flags to control the internal behavior of the outliner
  bool enable_classic=false;
  // use a wrapper for all variables or one parameter for a variable or a wrapper for all variables
  bool useParameterWrapper=false;  // use an array of pointers wrapper for parameters of the outlined function
  bool useStructureWrapper=false;  // use a structure wrapper for parameters of the outlined function
  bool preproc_only_=false;  // preprocessing only
  bool useNewFile=false; // generate the outlined function into a new source file
  bool copy_origFile=false; // when generating the new file to store outlined function, copy entire original file to it.
  bool temp_variable=false; // use temporary variables to reduce pointer dereferencing
  bool enable_liveness =false;
  bool enable_debug=false; // 
  bool exclude_headers=false;
  bool use_dlopen=false; // Outlining the target to a separated file and calling it using a dlopen() scheme. It turns on useNewFile.
  std::string output_path=""; // default output path is the original file's directory
  std::vector<std::string> handles; //  abstract handles of outlining targets, given by command line option -rose:outline:abstract_handle for each

// DQ (3/19/2019): Suppress the output of the #include "autotuning_lib.h" since some tools will want to define their own supporting libraries and header files.
  bool suppress_autotuning_header = false; // when generating the new file to store outlined function, suppress output of #include "autotuning_lib.h".
};

演算法

頂級驅動程式

大綱化器使用三種方法來查詢要概述的程式碼部分

collectPragms() 用於 C/C++
collectFortranTarget() 用於 Fortran，
collectAbstractHandles() 使用抽象控制代碼

大綱程式的頂級驅動程式：PragmaInterface.cc

Outliner::outlineAll (SgProject* project)
- collectPragms() 用於 C/C++ 或 collectFortranTarget() 用於 Fortran，或 collectAbstractHandles() 使用抽象控制代碼
outline(SgPragmaDeclaration)
- - outline(SgStatement, func_name)
    - preprocess(s)
    - outlineBlock (s_post, func_name) // Transform.cc 這裡的主要函式！！
- deleteAST(SgPragmaDeclaration)

資格檢查

檢查 SgNode 是否有資格進行大綱化。

Outliner::isOutlineable() src/Check.cc:251
- checkType() // 只有指定的 SgNode 型別可以進行大綱化，這裡維護著一個列表
- 排除 SgVariableDeclaration
- 必須包含在函式宣告內
  - 排除模板例項化（成員）函式宣告
- 不引用隱藏型別...

預處理

有兩個階段：預處理和實際轉換。

SgBasicBlock* s_post = preprocess (s);
- SgStatement * processPragma (SgPragmaDeclaration* decl) // 檢查它是否是大綱編譯指示 (#pragma rose_outline)，如果是，則返回下一個語句。
- Outliner::preprocess(SgStatement);
  - SgBasicBlock * Outliner::Preprocess::preprocessOutlineTarget (SgStatement* s)
    - normalizeVarDecl()
    - createBlock()
    - Outliner::Preprocess::transformPreprocIfs
    - Outliner::Preprocess::transformThisExprs
    - Outliner::Preprocess::transformNonLocalControlFlow
    - Outliner::Preprocess::gatherNonLocalDecls(); // 在這裡複製函式宣告，例如 test2005_179.C

實際轉換

Outliner::outline(stmt) --> generateFuncName(s) 唯一函式名稱 Outliner::outline (stmt, func_name)

Outliner::Transform::outlineBlock (s_post, func_name); // Transform.cc
- Outliner::Transform::collectVars (s, syms); // 收集要傳遞的變數
- Outliner::generateFunction() // 生成一個概述的函式，src/midend/programTransformation/astOutlining/GenerateFunc.cc
  - createFuncSkeleton()
  - moveStatementsBetweenBlocks (s, func_body); // 將源 BB 中的語句移動到函式體中
  - variableHandling (syms, func, vsym_remap); // 新增解包語句
    - createParam() // 建立引數
    - createUnpackDecl() // 建立解包語句：int local = parameter，來自 src/midend/programTransformation/astOutlining/GenerateFunc.cc
    - createPackStmt() // 在所有區域性計算後將區域性變數傳回引數
  - remapVarSyms (vsym_remap, func_body); // 變數替換
- insert() 來自 Insert.cc // 插入輪廓函式及其原型
  - insertFriendDecls()
  - insertGlobalPrototype()
    - GlobalProtoInserter::insertManually ()
      - generatePrototype()
- generateCall() // 生成對輪廓函式的呼叫
- ASTtools::replaceStatement () // 用呼叫替換原始部分

呼叫棧

#0  Outliner::generateFunction (s=0x7fffe849a990, func_name_str="OUT__1__11770__", syms=..) at ../../../sourcetree/src/midend/programTransformation/astOutlining/GenerateFunc.cc:1283
#1  0x00007ffff65bd93e in Outliner::outlineBlock (s=0x7fffe849a990, func_name_str="OUT__1__11770__") at ../../../sourcetree/src/midend/programTransformation/astOutlining/Transform.cc:310
#2  0x00007ffff6589b09 in Outliner::outline (s=0x7fffe849a990, func_name="OUT__1__11770__") at ../../../sourcetree/src/midend/programTransformation/astOutlining/Outliner.cc:166
#3  0x00007ffff65907f9 in Outliner::outline (decl=0x7fffe87a2310) at ../../../sourcetree/src/midend/programTransformation/astOutlining/PragmaInterface.cc:141
#4  0x00007ffff65911b8 in Outliner::outlineAll (project=0x7fffebc38010) at ../../../sourcetree/src/midend/programTransformation/astOutlining/PragmaInterface.cc:355
#5  0x000000000040c84f in main (argc=12, argv=0x7fffffffae38) at ../../../../../../sourcetree/tests/nonsmoke/functional/roseTests/astOutliningTests/outline.cc:51

對於要進行輪廓化的 C++ 程式碼塊，我們必須檢查對私有成員的訪問並新增必要的友元函式宣告

建立呼叫鏈：全部在 Insert.cc 中

Outliner::insert (SgFunctionDeclaration* func, SgGlobal* scope, SgBasicBlock* target_outlined_code )
- insertFriendDecls (SgFunctionDeclaration* func, SgGlobal* scope, FuncDeclList_t& friends) // 這裡的 func 是什麼？
  - insertFriendDecl (const SgFunctionDeclaration* func, SgGlobal* scope, SgClassDefinition* cls_def)
    - generateFriendPrototype (const SgFunctionDeclaration* full_decl, SgScopeStatement* scope, SgScopeStatement* class_scope) Insert.cc

insertFriendDecls (SgFunctionDeclaration* func, SgGlobal* scope, FuncDeclList_t& friends) 的演算法

對於輪廓函式

使用 isProtPrivMember (func) 查詢對類私有變數的引用
使用 isProtPrivMember (f_ref) 查詢對類私有成員函式的引用
將相關的類定義儲存到一個列表中

如果輪廓函式將在新的原始檔中建立，輪廓器也會將相關的聲明覆制到新的原始檔中。所使用的相關函式是 SageInterface::appendStatementWithDependentDeclaration(func,glob_scope,func_orig,exclude_headers);

使用該函式的程式碼位於原始檔的第 636 行：https://github.com/rose-compiler/rose/blob/weekly/src/midend/programTransformation/astOutlining/Transform.cc

變數處理

變數處理過程會找到程式碼塊中使用的變數，並決定如何將變數傳遞到輪廓函式中以及從輪廓函式中傳遞出去。它依賴於幾個程式分析來獲得最佳結果。

作用域分析（在 CollectVars.cc 中）：決定哪些變數應該作為函式引數傳遞，使用變數宣告相對於輪廓函式位置的可見性。如果原始宣告對輪廓函式可見，則無需將其作為函式引數傳遞。
collectPointerDereferencingVar：查詢應該在輪廓函式中使用指標解引用（在 VarSym.cc 中）的變數：ASTtools::collectPointerDereferencingVarSyms(s,pdSyms);
副作用分析：SageInterface::collectReadOnlyVariables(s,readOnlyVars);
存活性分析：SageInterface::getLiveVariables(liv, isSgForStatement(firstStmt), liveIns, liveOuts);

作用域分析：變數集和集合運算的符號，用於獲取哪些變數應該作為函式引數傳遞，在以下檔案中實現：

U：要進行輪廓化的程式碼塊 (s) 中使用的變數集
L：在 s 中宣告的區域性變數
U-L：應該作為函式引數傳遞到輪廓函式中或從輪廓函式中傳遞出去的變數
Q：在包含 s 的函式中定義，在 s 處可見，但不是在包含函式之外全域性宣告的變數。如果輪廓函式放在同一個檔案中，則全域性變數不應該作為引數傳遞。
(U-L) Intersect Q：要傳遞到輪廓函式中的變數

ASTtools::collectPointerDereferencingVarSyms ()：收集要在輪廓函式中用指標解引用 (pdSym) 替換的變數

pdSyms = useByAddressVars + Non-assignableVars + Struct/ClassVars
按地址使用分析：collectVarRefsUsingAddress(s, varSetB); 例如 &a
不可分配變數分析：collectVarRefsOfTypeWithoutAssignmentSupport(s,varSetB); 型別不可分配的變數
類/結構體變數：按引用傳遞對它們來說更高效

calculateVariableRestorationSet()：確定在輪廓函式的末尾是否需要從其克隆中恢復某些變數，僅在變數克隆功能開啟時使用

檢查每個函式引數
如果 isWritten && isLiveOut，則應恢復引數：在輪廓函式中更改，並在輪廓函式之後使用。

Transform.cc

/**
 * Major work of outlining is done here
 *  Preparations: variable collection
 *  Generate outlined function
 *  Replace outlining target with a function call
 *  Append dependent declarations,headers to new file if needed
 */
Outliner::Result
Outliner::outlineBlock (SgBasicBlock* s, const string& func_name_str)
{
...

  SgClassDeclaration* struct_decl = NULL;
  if (Outliner::useStructureWrapper)
  {
    struct_decl = generateParameterStructureDeclaration (s, func_name_str, syms, pdSyms, glob_scope);
    ROSE_ASSERT (struct_decl != NULL);
  }

  std::set<SgInitializedName*> restoreVars;
  calculateVariableRestorationSet (syms, readOnlyVars,liveOuts,restoreVars);

高階功能

可以使用命令列選項或程式設計 API 的內部標誌指定輪廓化的一些詳細資訊。

列表

將所有變數包裝到一個數據結構中：Outliner::useStructureWrapper

變數克隆

啟用此功能的選項

-rose:outline:temp_variable 使用臨時變數減少要傳遞的變數的指標解引用

此功能的目的是減少程式碼塊中的指標解引用，以便可以更輕鬆地對程式碼塊進行最佳化。轉換將使用區域性變數來獲取值，然後使用區域性變數來參與計算。之後，區域性變數的值將傳回指標值。

示例

// input code
#include <stdio.h>
#include <stdlib.h>

const char *abc_soups[10] = {("minstrone"), ("french onion"), ("Texas chili"), ("clam chowder"), ("potato leek"), ("lentil"), ("white bean"), ("chicken noodle"), ("pho"), ("fish ball")};

int main (void)
{
// split variable declarations with their initializations, as a better demo for the outliner
  const char *soupName;
  int value; 
#pragma rose_outline
  {
    value = rand();
    soupName = abc_soups[value  % 10];
  }

  printf ("Here are your %d,  %s soup\n", value, soupName);
  return 0;
}



// without variable cloning

#include <stdio.h>
#include <stdlib.h>
const char *abc_soups[10] = {("minstrone"), ("french onion"), ("Texas chili"), ("clam chowder"), ("potato leek"), ("lentil"), ("white bean"), ("chicken noodle"), ("pho"), ("fish ball")};
static void OUT__1__12274__(void **__out_argv);

int main()
{
// split variable declarations with their initializations, as a better demo for the outliner
  const char *soupName;
  int value;
  void *__out_argv1__12274__[2];
  __out_argv1__12274__[0] = ((void *)(&value));
  __out_argv1__12274__[1] = ((void *)(&soupName));
  OUT__1__12274__(__out_argv1__12274__);
  printf("Here are your %d,  %s soup\n",value,soupName);
  return 0;
}

static void OUT__1__12274__(void **__out_argv)
{
  const char **soupName = (const char **)__out_argv[1];
  int *value = (int *)__out_argv[0];
   *value = rand();                            // pointer dreferencing is used in the computation
   *soupName = abc_soups[ *value % 10];
}

// With variable cloning

#include <stdio.h>
#include <stdlib.h>
const char *abc_soups[10] = {("minstrone"), ("french onion"), ("Texas chili"), ("clam chowder"), ("potato leek"), ("lentil"), ("white bean"), ("chicken noodle"), ("pho"), ("fish ball")};
static void OUT__1__12274__(void **__out_argv);

int main()
{
// split variable declarations with their initializations, as a better demo for the outliner
  const char *soupName;
  int value;
  void *__out_argv1__12274__[2];
  __out_argv1__12274__[0] = ((void *)(&value));
  __out_argv1__12274__[1] = ((void *)(&soupName));
  OUT__1__12274__(__out_argv1__12274__);
  printf("Here are your %d,  %s soup\n",value,soupName);
  return 0;
}


static void OUT__1__12274__(void **__out_argv)
{
  const char *soupName =  *((const char **)__out_argv[1]);
  int value =  *((int *)__out_argv[0]);   // local variable, original type, (not pointer type)
  value = rand();                         // local variable in computation.
  soupName = abc_soups[value % 10];
   *((const char **)__out_argv[1]) = soupName;
   *((int *)__out_argv[0]) = value;
}

區域性變數的型別

 452│     SgType* local_type = NULL;
 453│     if( SageInterface::is_Fortran_language( ) )
 454│         local_type= orig_var_type;
 455│     else if( Outliner::temp_variable || Outliner::useStructureWrapper )
 456│     // unique processing for C/C++ if temp variables are used
 457│     {
 458│         if( isPointerDeref || ( !isPointerDeref && is_array_parameter ) )
 459│         {
 460│           // Liao 3/11/2015. For a parameter of a reference type, we have to specially tweak the unpacking statement
 461│           // It is not allowed to create a pointer to a reference type. So we use a pointer to its raw type (stripped reference type) instead.
 462│             // use pointer dereferencing for some
 463│             if (SgReferenceType* rtype = isSgReferenceType(orig_var_type))
 464│                local_type = buildPointerType(rtype->get_base_type());
 465│             else
 466│                local_type = buildPointerType(orig_var_type);
 467│         }
 468│         else                    // use variable clone instead for others
 469│             local_type = orig_var_type;
 470│     }
 471│     else // all other cases: non-fortran, not using variable clones 
 472│     {
 473│         if( is_C_language( ) )
 474│         {   
 475│             // we use pointer types for all variables to be passed
 476│             // the classic outlining will not use unpacking statement, but use the parameters directly.
 477│             // So we can safely always use pointer dereferences here
 478│             local_type = buildPointerType( orig_var_type );
 479│         }
 480│         else // C++ language
 481│             // Rich's idea was to leverage C++'s reference type: two cases:
 482│             //  a) for variables of reference type: no additional work
 483│             //  b) for others: make a reference type to them
 484│             //   all variable accesses in the outlined function will have
 485│             //   access the address of the by default, not variable substitution is needed 
 486│         { 
 487|              local_type = isSgReferenceType( orig_var_type ) ? orig_var_type 
 488│                                                             : SgReferenceType::createType( orig_var_type );
 489│         }
 490│     }

Transform.cc：收集變數

 std::set<SgInitializedName*> restoreVars;

 calculateVariableRestorationSet (syms, readOnlyVars,liveOuts,restoreVars);

dlopen

use_dlopen 選項告訴輪廓器使用 dlopen() 查詢並呼叫儲存在動態載入庫中的輪廓函式。

此選項將開啟其他幾個選項（在 Outliner.cc Outliner::validateSettings() 中）

-rose:outline:exclude_headers
useNewFile= true;
useParameterWrapper = true;
temp_variable = true;

編譯和連結說明：假設輸入檔案是 ft.c

outline -rose:outline:use_dlopen -I/home/liao6/workspace/outliner/build/../sourcetree/projects/autoTuning -c /path/to/ft.c
- 此步驟將生成兩個檔案
- rose_ft.c：原始 ft.c 檔案被轉換為此檔案
- rose_ft_lib.c（輪廓函式位於共享庫檔案中）
從 rose_ft_lib.c 構建 .so 檔案
- gcc -I. -g -fPIC -c rose_ft_lib.c
- gcc -g -shared rose_ft_lib.o -o rose_ft_lib.so
- cp rose_ft_lib.so /tmp/.
將所有內容連結在一起
- 目標檔案應與 libautoTuning.a 連結，libautoTuning.a 由 projects/autoTuning/autotuning_lib.c 構建，而 autotuning_lib.c 又定義了 findFunctionUsingDlopen()。
- gcc -o a.out rose_ft.o /roseInstallPath/lib/libautoTuning.a -Wl,--export-dynamic -g -ldl -lm

可以在以下位置找到使用 dlopen 的完整示例

https://github.com/chunhualiao/outliner-demo

測試

ROSE AST 輪廓器有一個專門的測試目錄：rose/tests/nonsmoke/functional/roseTests/astOutliningTests

一些 C、C++ 和 Fortran 測試輸入檔案已準備就緒。
示例命令列選項在該測試目錄中的 Makefile.am 檔案中提供。

完整的命令列示例

/home/liao6/workspace/rose/buildtree/tests/nonsmoke/functional/roseTests/astOutliningTests/outline -rose:outline:use_dlopen -rose:outline:temp_variable -I/home/liao6/workspace/rose/buildtree/../sourcetree/projects/autoTuning -rose:outline:exclude_headers -rose:outline:output_path . -c /home/liao6/workspace/rose/sourcetree/tests/nonsmoke/functional/roseTests/astOutliningTests/array1.c

要觸發單個測試，假設輸入檔名為 inputFile.c

make classic_inputFile.c.passed // 經典行為
make dlopen_inputFile.c.passed // dlopen 功能

如您所見，字首表示使用輪廓器的不同選項。

示例輸入和輸出

作為獨立工具

輸入檔案，使用 pragma 指示要進行輪廓化的程式碼部分

int main()
{
    double n, start=1, total;
    double unlucky=0, lucky;
    double *number;
	                 
    scanf("%lf",&n);                    
    total = 9;                      
    for(int j =1; j < n; j++)
    {
      total = total * 10;
      start = start *10;
    }

    number = (double*)malloc(n * sizeof(double));                           
    for(double i = start; i < start*10; i++)
    {
      double temp = i;
#pragma rose_outline
      for(int j = 1; j<= n; j++)
      {
	number[j]=(int)temp%10;
	temp = temp/10;
      }
      for(int k = n; k>=1; k--)
      {
	if(number[k] == 1 && number[k-1] == 3){
	  unlucky++;
	  break;
	}
      }
    }                                   
    lucky = total - unlucky;
    printf("there are %f lucky integers in %f digits integers", lucky, n);
    return 0;
}


//------------output file is

static void OUT__1__2222__(double *np__,double **numberp__,double *tempp__);

int main()
{
  double n;
  double start = 1;
  double total;
  double unlucky = 0;
  double lucky;
  double *number;
  scanf("%lf",&n);
  total = 9;
  for (int j = 1; j < n; j++) {
    total = total * 10;
    start = start * 10;
  }
  number = ((double *)(malloc(n * (sizeof(double )))));
  for (double i = start; i < start * 10; i++) {
    double temp = i;
    OUT__1__2222__(&n,&number,&temp);
    for (int k = n; k >= 1; k--) {
      if (number[k] == 1 && number[k - 1] == 3) {
        unlucky++;
        break; 
      }
    }
  }
  lucky = total - unlucky;
  printf("there are %f lucky integers in %f digits integers",lucky,n);
  return 0;
}

static void OUT__1__2222__(double *np__,double **numberp__,double *tempp__)
{
  double *n = (double *)np__;
  double **number = (double **)numberp__;
  double *temp = (double *)tempp__;
  for (int j = 1; j <=  *n; j++) {
    ( *number)[j] = (((int )( *temp)) % 10);
     *temp =  *temp / 10;
  }
}

char* type

輸入

#include <stdio.h>
#include <stdlib.h>

const char *abc_soups[10] = {("minstrone"), ("french onion"), ("Texas chili"), ("clam chowder"), ("potato leek"), ("lentil"), ("white bean"), ("chicken noodle"), ("pho"), ("fish ball")};

int main (void)
{
// split variable declarations with their initializations, as a better demo for the outliner
  int abc_numBowls;
  const char *abc_soupName;
  int numBowls;
  const char *soupName;
#pragma rose_outline
  {
    abc_numBowls = rand () % 10;
    abc_soupName = abc_soups[rand () % 10];
    numBowls = abc_numBowls;
    soupName = abc_soupName;
  }

  printf ("Here are your %d bowls of %s soup\n", numBowls, soupName);

  printf ("-----------------------------------------------------\n");
  return 0;
}

outline --edg:no_warnings -rose:verbose 0 -rose:outline:parameter_wrapper -rose:detect_dangling_pointers 1 -c input.cpp

輸出檔案

#include <stdio.h>
#include <stdlib.h>
const char *abc_soups[10] = {("minstrone"), ("french onion"), ("Texas chili"), ("clam chowder"), ("potato leek"), ("lentil"), ("white bean"), ("chicken noodle"), ("pho"), ("fish ball")};
static void OUT__1__11770__(void **__out_argv);

int main()
{
// split variable declarations with their initializations, as a better demo for the outliner
  int abc_numBowls;
  const char *abc_soupName;
  int numBowls;
  const char *soupName;
  void *__out_argv1__11770__[4];
  __out_argv1__11770__[0] = ((void *)(&soupName));
  __out_argv1__11770__[1] = ((void *)(&numBowls));
  __out_argv1__11770__[2] = ((void *)(&abc_soupName));
  __out_argv1__11770__[3] = ((void *)(&abc_numBowls));
  OUT__1__11770__(__out_argv1__11770__);
  printf("Here are your %d bowls of %s soup\n",numBowls,soupName);
  printf("-----------------------------------------------------\n");
  return 0;
}

static void OUT__1__11770__(void **__out_argv)
{
  int &abc_numBowls =  *((int *)__out_argv[3]);
  const char *&abc_soupName =  *((const char **)__out_argv[2]);
  int &numBowls =  *((int *)__out_argv[1]);
  const char *&soupName =  *((const char **)__out_argv[0]);
  abc_numBowls = rand() % 10;
  abc_soupName = abc_soups[rand() % 10];
  numBowls = abc_numBowls;
  soupName = abc_soupName;
}

使用 C++ 成員函式

輸入程式碼

int a;

class B 
{
  private: 

  int b;
 inline void foo(int c)
 {
#pragma rose_outline
   b = a+c;
 }
};

輸出程式碼

新增輪廓函式的友元宣告，以便它可以訪問私有類成員
將此指標作為函式引數傳遞給類物件

int a;
static void OUT__1__2386__(int *cp__,void *this__ptr__p__);

class B 
{
  public: friend void ::OUT__1__2386__(int *cp__,void *this__ptr__p__);
  private: int b;
  

  inline void foo(int c)
{
// //A declaration for this pointer
    class B *this__ptr__ = this;
    OUT__1__2386__(&c,&this__ptr__);
  }
}
;

static void OUT__1__2386__(int *cp__,void *this__ptr__p__)
{
  int &c =  *((int *)cp__);
  class B *&this__ptr__ =  *((class B **)this__ptr__p__);
  this__ptr__ -> b = a + c;
}

使用 -rose:outline:parameter_wrapper，結果會有所不同

在呼叫函式中，所有引數都將被包裝到一個指向指標的陣列中
該陣列將在輪廓函式中解包以檢索引數

int a;
static void OUT__1__2391__(void **__out_argv);

class B 
{
  public: friend void ::OUT__1__2391__(void **__out_argv);
  private: int b;
  

  inline void foo(int c)
{
// //A declaration for this pointer
    class B *this__ptr__ = this;
    void *__out_argv1__1527__[2];
    __out_argv1__1527__[0] = ((void *)(&this__ptr__));
    __out_argv1__1527__[1] = ((void *)(&c));
    OUT__1__2391__(__out_argv1__1527__);
  }
}
;

static void OUT__1__2391__(void **__out_argv)
{
  int &c =  *((int *)__out_argv[1]);
  class B *&this__ptr__ =  *((class B **)__out_argv[0]);
  this__ptr__ -> b = a + c;
}

用於 OpenMP 實現

在 ROSE_Compiler_Framework/OpenMP_Support 中檢視更多資訊。

以下是翻譯示例

/*a test C program. You can replace this content with yours, within 20,000 character limit (about 500 lines) . */
#include<stdio.h>
#include<stdlib.h>

int main(int argc, char* argv[])
{
    int nthreads, tid;
    #pragma omp parallel private(nthreads, tid)
    {
        tid = omp_get_thread_num();
	printf("Hello World from thread = %d ", tid);
	if(tid == 0)
	{
	    nthreads = omp_get_num_threads();
	    printf("Number of threads = %d", nthreads);
	}
    }
    return 0;
}


//------------- output code --------------
/*a test C program. You can replace this content with yours, within 20,000 character limit (about 500 lines) . */
#include<stdio.h>
#include<stdlib.h>
#include "libxomp.h" 
static void OUT__1__2231__(void *__out_argv);

int main(int argc,char *argv[])
{
  int status = 0;
  XOMP_init(argc,argv);
  int nthreads;
  int tid;
  XOMP_parallel_start(OUT__1__2231__,0,1,0,"/tmp/test-20191219_224253-113680.c",8);
  XOMP_parallel_end("/tmp/test-20191219_224253-113680.c",17);
  XOMP_terminate(status);
  return 0;
}

static void OUT__1__2231__(void *__out_argv)
{
  int _p_nthreads;
  int _p_tid;
  _p_tid = omp_get_thread_num();
  printf("Hello World from thread = %d ",_p_tid);
  if (_p_tid == 0) {
    _p_nthreads = omp_get_num_threads();
    printf("Number of threads = %d",_p_nthreads);
  }
}

用於為 OpenMP 4.x 生成 CUDA 核心

經典 Jacobi OpenMP 4.0 版本的示例輸入和輸出程式碼

//--------------input--------------

void jacobi( )
{
  REAL omega;
  int i,j,k;
  REAL error,resid,ax,ay,b;
  //      double  error_local;

  //      float ta,tb,tc,td,te,ta1,ta2,tb1,tb2,tc1,tc2,td1,td2;
  //      float te1,te2;
  //      float second;

  omega=relax;
  /*
   * Initialize coefficients */

  ax = 1.0/(dx*dx); /* X-direction coef */
  ay = 1.0/(dy*dy); /* Y-direction coef */
  b  = -2.0/(dx*dx)-2.0/(dy*dy) - alpha; /* Central coeff */

  error = 10.0 * tol;
  k = 1;

  // An optimization on top of naive coding: promoting data handling outside the while loop
  // data properties may change since the scope is bigger:
#pragma omp target data map(to:n, m, omega, ax, ay, b, f[0:n][0:m]) map(tofrom:u[0:n][0:m]) map(alloc:uold[0:n][0:m])
  while ((k<=mits)&&(error>tol))
  {
    error = 0.0;

    /* Copy new solution into old */
#pragma omp target map(to:n, m, u[0:n][0:m]) map(from:uold[0:n][0:m])
#pragma omp parallel for private(j,i) collapse(2)
    for(i=0;i<n;i++)
      for(j=0;j<m;j++)
        uold[i][j] = u[i][j];

#pragma omp target map(to:n, m, omega, ax, ay, b, f[0:n][0:m], uold[0:n][0:m]) map(from:u[0:n][0:m])
#pragma omp parallel for private(resid,j,i) reduction(+:error) collapse(2) // nowait
    for (i=1;i<(n-1);i++)
      for (j=1;j<(m-1);j++)
      { 
        resid = (ax*(uold[i-1][j] + uold[i+1][j])\
            + ay*(uold[i][j-1] + uold[i][j+1])+ b * uold[i][j] - f[i][j])/b;

        u[i][j] = uold[i][j] - omega * resid;
        error = error + resid*resid ;
      }
...

    /* Error check */

    if (k%500==0)
      printf("Finished %d iteration with error =%f\n",k, error);
    error = sqrt(error)/(n*m);

    k = k + 1;
  }          /*  End iteration loop */
  printf("Total Number of Iterations:%d\n",k);
  printf("Residual:%E\n", error);
  printf("Residual_ref :%E\n", resid_ref);
  printf ("Diff ref=%E\n", fabs(error-resid_ref));
  assert (fabs(error-resid_ref) < 1E-13);
}



//----------------output-----------------

#include "libxomp.h" 
#include "xomp_cuda_lib_inlined.cu" 
...



__global__ void OUT__1__8714__(float omega,float ax,float ay,float b,int __final_total_iters__2__,int __i_interval__3__,float *_dev_per_block_error,float *_dev_u,float *_dev_f,float *_dev_uold)
{
  int _p_i;
  int _p_j;
  float _p_error;
  _p_error = 0;
  float _p_resid;
  int _p___collapsed_index__5__;
  int _dev_lower;
  int _dev_upper;
  int _dev_loop_chunk_size;
  int _dev_loop_sched_index;
  int _dev_loop_stride;
  int _dev_thread_num = getCUDABlockThreadCount(1);
  int _dev_thread_id = getLoopIndexFromCUDAVariables(1);
  XOMP_static_sched_init(0,__final_total_iters__2__ - 1,1,1,_dev_thread_num,_dev_thread_id,&_dev_loop_chunk_size,&_dev_loop_sched_index,&_dev_loop_stride);
  while(XOMP_static_sched_next(&_dev_loop_sched_index,__final_total_iters__2__ - 1,1,_dev_loop_stride,_dev_loop_chunk_size,_dev_thread_num,_dev_thread_id,&_dev_lower,&_dev_upper))
    for (_p___collapsed_index__5__ = _dev_lower; _p___collapsed_index__5__ <= _dev_upper; _p___collapsed_index__5__ += 1) {
      _p_i = _p___collapsed_index__5__ / __i_interval__3__ * 1 + 1;
      _p_j = _p___collapsed_index__5__ % __i_interval__3__ * 1 + 1;
      _p_resid = (ax * (_dev_uold[(_p_i - 1) * 512 + _p_j] + _dev_uold[(_p_i + 1) * 512 + _p_j]) + ay * (_dev_uold[_p_i * 512 + (_p_j - 1)] + _dev_uold[_p_i * 512 + (_p_j + 1)]) + b * _dev_uold[_p_i * 512 + _p_j] - _dev_f[_p_i * 512 + _p_j]) / b;
      _dev_u[_p_i * 512 + _p_j] = _dev_uold[_p_i * 512 + _p_j] - omega * _p_resid;
      _p_error = _p_error + _p_resid * _p_resid;
    }
  xomp_inner_block_reduction_float(_p_error,_dev_per_block_error,6);
}

...


void jacobi()
{
  float omega;
  int i;
  int j;
  int k;
  float error;
  float resid;
  float ax;
  float ay;
  float b;
//      double  error_local;
//      float ta,tb,tc,td,te,ta1,ta2,tb1,tb2,tc1,tc2,td1,td2;
//      float te1,te2;
//      float second;
  omega = relax;
/*
     * Initialize coefficients */
/* X-direction coef */
  ax = (1.0 / (dx * dx));
/* Y-direction coef */
  ay = (1.0 / (dy * dy));
/* Central coeff */
  b = (- 2.0 / (dx * dx) - 2.0 / (dy * dy) - alpha);
  error = (10.0 * tol);
  k = 1;
/* Translated from #pragma omp target data ... */
{
    xomp_deviceDataEnvironmentEnter();
    float *_dev_u;
    int _dev_u_size = sizeof(float ) * n * m;
    _dev_u = ((float *)(xomp_deviceDataEnvironmentPrepareVariable(((void *)u),_dev_u_size,1,1)));
    float *_dev_f;
    int _dev_f_size = sizeof(float ) * n * m;
    _dev_f = ((float *)(xomp_deviceDataEnvironmentPrepareVariable(((void *)f),_dev_f_size,1,0)));
    float *_dev_uold;
    int _dev_uold_size = sizeof(float ) * n * m;
    _dev_uold = ((float *)(xomp_deviceDataEnvironmentPrepareVariable(((void *)uold),_dev_uold_size,0,0)));
    while(k <= mits && error > tol){
      int __i_total_iters__0__ = (n - 1 - 1 - 1 + 1) % 1 == 0?(n - 1 - 1 - 1 + 1) / 1 : (n - 1 - 1 - 1 + 1) / 1 + 1;
      int __j_total_iters__1__ = (m - 1 - 1 - 1 + 1) % 1 == 0?(m - 1 - 1 - 1 + 1) / 1 : (m - 1 - 1 - 1 + 1) / 1 + 1;
      int __final_total_iters__2__ = 1 * __i_total_iters__0__ * __j_total_iters__1__;
      int __i_interval__3__ = __j_total_iters__1__ * 1;
      int __j_interval__4__ = 1;
      int __collapsed_index__5__;
      int __i_total_iters__6__ = (n - 1 - 0 + 1) % 1 == 0?(n - 1 - 0 + 1) / 1 : (n - 1 - 0 + 1) / 1 + 1;
      int __j_total_iters__7__ = (m - 1 - 0 + 1) % 1 == 0?(m - 1 - 0 + 1) / 1 : (m - 1 - 0 + 1) / 1 + 1;
      int __final_total_iters__8__ = 1 * __i_total_iters__6__ * __j_total_iters__7__;
      int __i_interval__9__ = __j_total_iters__7__ * 1;
      int __j_interval__10__ = 1;
      int __collapsed_index__11__;
      error = 0.0;
/* Copy new solution into old */
{
        xomp_deviceDataEnvironmentEnter();
        float *_dev_u;
        int _dev_u_size = sizeof(float ) * n * m;
        _dev_u = ((float *)(xomp_deviceDataEnvironmentPrepareVariable(((void *)u),_dev_u_size,1,0)));
        float *_dev_uold;
        int _dev_uold_size = sizeof(float ) * n * m;
        _dev_uold = ((float *)(xomp_deviceDataEnvironmentPrepareVariable(((void *)uold),_dev_uold_size,0,1)));
/* Launch CUDA kernel ... */
        int _threads_per_block_ = xomp_get_maxThreadsPerBlock();
        int _num_blocks_ = xomp_get_max1DBlock(__final_total_iters__8__ - 1 - 0 + 1);
        OUT__2__8714__<<<_num_blocks_,_threads_per_block_>>>(__final_total_iters__8__,__i_interval__9__,_dev_u,_dev_uold);
        xomp_deviceDataEnvironmentExit();
      }
{
        xomp_deviceDataEnvironmentEnter();
        float *_dev_u;
        int _dev_u_size = sizeof(float ) * n * m;
        _dev_u = ((float *)(xomp_deviceDataEnvironmentPrepareVariable(((void *)u),_dev_u_size,0,1)));
        float *_dev_f;
        int _dev_f_size = sizeof(float ) * n * m;
        _dev_f = ((float *)(xomp_deviceDataEnvironmentPrepareVariable(((void *)f),_dev_f_size,1,0)));
        float *_dev_uold;
        int _dev_uold_size = sizeof(float ) * n * m;
        _dev_uold = ((float *)(xomp_deviceDataEnvironmentPrepareVariable(((void *)uold),_dev_uold_size,1,0)));
/* Launch CUDA kernel ... */
        int _threads_per_block_ = xomp_get_maxThreadsPerBlock();
        int _num_blocks_ = xomp_get_max1DBlock(__final_total_iters__2__ - 1 - 0 + 1);
        float *_dev_per_block_error = (float *)(xomp_deviceMalloc(_num_blocks_ * sizeof(float )));
        OUT__1__8714__<<<_num_blocks_,_threads_per_block_,(_threads_per_block_ * sizeof(float ))>>>(omega,ax,ay,b,__final_total_iters__2__,__i_interval__3__,_dev_per_block_error,_dev_u,_dev_f,_dev_uold);
        error = xomp_beyond_block_reduction_float(_dev_per_block_error,_num_blocks_,6);
        xomp_freeDevice(_dev_per_block_error);
        xomp_deviceDataEnvironmentExit();
      }
//    }
/*  omp end parallel */
/* Error check */
      if (k % 500 == 0) {
        printf("Finished %d iteration with error =%f\n",k,error);
      }
      error = (sqrt(error) / (n * m));
      k = k + 1;
/*  End iteration loop */
    }
    xomp_deviceDataEnvironmentExit();
  }
  printf("Total Number of Iterations:%d\n",k);
  printf("Residual:%E\n",error);
  printf("Residual_ref :%E\n",resid_ref);
  printf("Diff ref=%E\n",(fabs((error - resid_ref))));
  fabs((error - resid_ref)) < 1E-14?((void )0) : __assert_fail("fabs(error-resid_ref) < 1E-14","jacobi-ompacc-opt2.c",236,__PRETTY_FUNCTION__);
}

在 ROSE_Compiler_Framework/OpenMP_Acclerator_Model_Implementation 中檢視詳細資訊

已知問題

列表

當將 Outliner::useStructureWrapper 設定為 true 時，會出現“副作用分析錯誤！”的訊息。這在教程目錄中的 outlineIfs 示例中也會發生。
- 如果您的翻譯器仍然有效，您可以忽略此警告訊息。如果啟用了 Outliner::useStructureWrapper，則外圍程式在內部會使用一些分析。但有些分析可能並不總是能夠處理所有情況，因此它們只是放棄並通知外圍程式。外圍程式的設計目的是在這種情況下做出保守的決定，並生成不太理想的翻譯程式碼。

出版物

一篇描述 AST 外圍程式內部機制的論文，如果您恰好將 AST 外圍程式用於您的研究工作，則建議您引用這篇論文。

Chunhua Liao，Daniel J. Quinlan，Richard Vuduc 和 Thomas Panas。2009 年。有效地進行原始碼到原始碼的輪廓化以支援全程式經驗最佳化。在第 22 屆平行計算語言和編譯器國際會議（LCPC'09）論文集

支援為 CPU 和 GPU 生成多執行緒核心

Chunhua Liao，Daniel J. Quinlan，Thomas Panas，Bronis R. de Supinski，基於 ROSE 的 OpenMP 3.0 研究編譯器，支援多種執行時庫，第 6 屆 OpenMP 超越迴圈級並行性的國際會議論文集：加速器、任務和更多，2010 年 6 月 14 日至 16 日，日本筑波
C. Liao，Y. Yan，B. R. de Supinski，D. J. Quinlan 和 B. Chapman，“OpenMP 加速器模型的早期體驗”，《低功耗裝置和加速器時代的 OpenMP》，施普林格出版社，2013 年，第 84-98 頁。

用於支援經驗性調整或自動調整

Shirley Moore，計算化學應用程式程式碼的重構和自動效能調整，冬季模擬會議論文集，2012 年 12 月 09 日至 12 日，德國柏林
Nicholas Chaimov，Scott Biersdorff，Allen D Malony，基於機器學習的經驗性自動調整和專門化工具，國際高效能計算應用雜誌，第 27 卷第 4 期，第 403-411 頁，2013 年 11 月