Cg 程式設計/Unity/計算最亮畫素

本教程展示瞭如何使用 Unity 中的計算著色器來計算影像中最亮畫素的位置。特別是，它展示了執行緒組中的執行緒如何使用“groupshared”資料以及如何同步這些執行緒的執行。如果您不熟悉 Unity 中的計算著色器，您應該首先閱讀 “計算影像效果”部分和 “計算顏色直方圖”部分。請注意，計算著色器在 macOS 上不受支援。

為什麼要這樣做？

在拍攝的影像中查詢最亮畫素對於光學運動捕捉的一些應用很有用。另一個應用是模板匹配演算法，該演算法應用於影像所有畫素的位置，並將匹配可能性儲存在中間影像的每個畫素處。在這種情況下，該中間影像的“最亮”畫素表示與模板的最佳匹配。找到這種最佳匹配對於基於模板的特徵檢測和跟蹤很有用。

此外，查詢最亮畫素的問題與許多其他問題密切相關，例如，查詢最暗畫素或兩個（或更多）最亮畫素或兩個（或更多）具有特定距離的最亮畫素或影像所有畫素的總和或平均值等。事實上，透過解決查詢最亮畫素的問題，人們非常接近解決幾個相關的問題。

使用計算著色器查詢最亮畫素

為了在影像中查詢最亮畫素，必須檢視影像的所有畫素；因此，該問題可以從並行化中受益匪淺。

在本教程中，我們實現了一個計算著色器，它首先在影像的一行畫素中找到最亮畫素——只需迴圈遍歷該行中的所有畫素並跟蹤遇到的最亮畫素即可。我們將此計算著色器並行呼叫影像的所有行。結果是一個數組，其中包含每行的最亮畫素，這可能是一個相對較大的陣列（取決於影像的高度）。因此，我們透過在著色器末尾計算每個執行緒組的最亮畫素來減少該陣列的大小。由於我們使用 64 個執行緒的執行緒組，這將使結果陣列的維度減少 64 倍，新結果是每個執行緒組的最亮畫素陣列。可以嘗試進一步並行減少該陣列，但由於該陣列已經相對較小，因此我們只需將資料傳輸到 CPU 並透過 CPU 上的線性搜尋找到整個影像中最亮畫素。注意：對於任何工具，不僅要知道何時使用它，還要知道何時不使用它。

這是計算著色器的第一個版本

#pragma kernel MaximumMain

Texture2D<float4> InputTexture;
int InputTextureWidth;

struct maxStruct 
{
   uint xMax; // column of maximum
   uint yMax; // row of maximum
   uint lMax; // luminance of maximum (0, ..., 1023)
};

RWStructuredBuffer<maxStruct> GroupMaxBuffer;

groupshared maxStruct rowMaxData[64];

[numthreads(64,1,1)]
void MaximumMain (uint3 groupID : SV_GroupID, 
      // 3D ID of thread group; range depends on Dispatch call
   uint3 groupThreadID : SV_GroupThreadID, 
      // 3D ID of thread in a thread group; range depends on numthreads
   uint groupIndex : SV_GroupIndex, 
      // flattened/linearized SV_GroupThreadID. 
      // groupIndex specifies the index within the group (0 to 63)
   uint3 id : SV_DispatchThreadID) 
      // = SV_GroupID * numthreads + SV_GroupThreadID
      // id.x specifies the row in the input texture image
{
   int column;

   // find the maximum of this row 
   // and store its data in rowMaxData[groupIndex]
   rowMaxData[groupIndex].xMax = 0; 
   rowMaxData[groupIndex].yMax = id.x; 
   rowMaxData[groupIndex].lMax = 0;
   for (column = 0; column < InputTextureWidth; column++) 
   {
      float4 color = InputTexture[uint2(column, id.x)];
      uint luminance = (uint)(1023.0 * 
         (0.21 * color.r + 0.72 * color.g + 0.07 * color.b));
      if (luminance > rowMaxData[groupIndex].lMax) 
      {
         rowMaxData[groupIndex].xMax = column;
         rowMaxData[groupIndex].lMax = luminance;
      }
   }

   // find the maximum of this group 
   // and store its data in GroupMaxBuffer[groupID.x]
   GroupMemoryBarrierWithGroupSync(); 
   if (0 == groupIndex) 
   {
      int row; 
      int rowMax = 0;
      for (row = 1; row < 64; row++) 
      { 
         if (rowMaxData[row].lMax > rowMaxData[rowMax].lMax) 
         {
            rowMax = row;
         }
      }
      GroupMaxBuffer[groupID.x] = rowMaxData[rowMax];
   }
}

第一行（特定於 Unity）#pragma kernel MaximumMain 指定函式 MaximumMain() 是一個計算著色器函式，可以從指令碼中呼叫。

Texture2D<float4> InputTexture 是一個統一變數，用於訪問 RGBA 輸入紋理，而 int InputTextureWidth 是一個統一變數，用於獲取其寬度，即一行畫素的長度。

接下來的幾行定義了一個結構體，用於儲存最亮畫素候選者的資料。xMax 和 yMax 是它的座標，而 lMax 是它的相對亮度，從 0 到 1023

struct maxStruct 
{
   uint xMax; // column of maximum
   uint yMax; // row of maximum
   uint lMax; // luminance of maximum (0, ..., 1023)
};

定義 RWStructuredBuffer<maxStruct> GroupMaxBuffer 使用此結構體來定義一個 RWStructuredBuffer（對應於 Unity 中的計算緩衝區），用於儲存每個執行緒組中最亮畫素的資訊。

定義 groupshared maxStruct rowMaxData[64] 使用相同的結構體來定義一個 groupshared 陣列，用於儲存當前執行緒組中每個執行緒（即每行）中最亮畫素的資訊。請注意，Direct3D 11 中 groupshared 資料的總大小限制為 32 KB。假設無符號 int 最多需要 8 位元組，rowMaxData 陣列最多需要 64 × 3 × 8 = 1536 位元組，遠低於 32 KB 的限制。

我們使用 [numthreads(64, 1, 1)] 而不是 [numthreads(1, 64, 1)] 來定義執行緒組的維度，因為執行緒組假定要處理 64 行的“一維陣列”，並且通常使用 x 維度來表示一維組更簡單。

計算著色器函式 MaximumMain() 請求所有可用的執行緒相關索引（儘管它沒有使用 groupThreadID）。執行緒組的索引 groupID.x 用於索引 GroupMaxBuffer，執行緒組內的執行緒索引 groupIndex 用於索引 rowMaxData，並且整體排程索引 id.x 指定影像的整行。

然後函式 MaximumMain() 透過將變數 column 從 0 計數到 InputTextureWidth - 1 來執行迴圈遍歷執行緒行的所有畫素。它計算每個畫素的相對亮度（按 1023 比例縮放以使用無符號 int），將此亮度與到目前為止的最大亮度進行比較，如果新亮度更大，則更新 rowMaxData[groupIndex] 中的資料，該資料在迴圈結束時包含關於該行中最亮畫素的資訊。

在計算完一行中最亮畫素後，該函式計算執行緒組中最亮畫素。由於我們需要比較不同執行緒的資料，因此首先必須確保所有執行緒都已確定其行中最亮畫素。這是透過 GroupMemoryBarrierWithGroupSync() 實現的，它不僅確保該行之前的執行緒組的所有記憶體寫入都已完成，而且還會等到執行緒組中的所有執行緒都到達該行。然後程式碼檢查 groupIndex 是否為 0，即這是否是執行緒組的第零個執行緒。只有此執行緒確定 rowMaxData 中畫素中最亮的畫素，並將其寫入 GroupMaxBuffer[groupID.x]。雖然這種解決方案有效（並且易於實現），但它在某種程度上是浪費的，因為執行緒組中的其他 63 個執行緒在第零個執行緒在此迴圈中工作時無事可做。

下面給出了一個更有效的替代方案，即計算著色器的第二個版本。它實現了類似於淘汰賽的減少操作（或摺疊函式）：在第一步中，每個偶數編號的執行緒將其最亮的畫素與下一個執行緒的最亮的畫素進行比較。在第二步中，每個編號可被 4 整除的執行緒將其最佳候選畫素與下一個執行緒進行比較，依此類推。在第六步（也是最後一步）中，第零個執行緒將其最佳候選者與第 32 個執行緒的最佳候選者進行比較。最後一次比較的“獲勝者”就是該組中最亮的畫素。在這個版本中仍然有很多空閒執行緒，但它只需要 6 步，而不是 64 次迭代的迴圈，這是一個值得的改進。避免任何空閒執行緒將需要多個排程呼叫，這會帶來一些開銷，因此可能不會節省任何時間。

這是改進的著色器

#pragma kernel MaximumMain

Texture2D<float4> InputTexture;
int InputTextureWidth;

struct maxStruct 
{
   uint xMax; // column of maximum
   uint yMax; // row of maximum
   uint lMax; // luminance of maximum (0, ..., 1023)
};

RWStructuredBuffer<maxStruct> GroupMaxBuffer;

groupshared maxStruct rowMaxData[64];

[numthreads(64,1,1)]
void MaximumMain (uint3 groupID : SV_GroupID, 
      // 3D ID of thread group; range depends on Dispatch call
   uint3 groupThreadID : SV_GroupThreadID, 
      // 3D ID of thread in a thread group; range depends on numthreads
   uint groupIndex : SV_GroupIndex, 
      // flattened/linearized SV_GroupThreadID. 
      // groupIndex specifies the index within the group (0 to 63)
   uint3 id : SV_DispatchThreadID) 
      // = SV_GroupID * numthreads + SV_GroupThreadID
      // id.x specifies the row in the input texture image
{
   int column;

   // find the maximum of this row 
   // and store its data in rowMaxData[groupIndex]
   rowMaxData[groupIndex].xMax = 0; 
   rowMaxData[groupIndex].yMax = id.x; 
   rowMaxData[groupIndex].lMax = 0;
   for (column = 0; column < InputTextureWidth; column++) 
   {
      float4 color = InputTexture[uint2(column, id.x)];
      uint luminance = (uint)(1023.0 * 
         (0.21 * color.r + 0.72 * color.g + 0.07 * color.b));
      if (luminance > rowMaxData[groupIndex].lMax) 
      {
         rowMaxData[groupIndex].xMax = column;
         rowMaxData[groupIndex].lMax = luminance;
      }
   }

   // find the maximum of this group
   // and store its data in GroupMaxBuffer[groupID.x]
   GroupMemoryBarrierWithGroupSync(); 
      // we have to wait for all writes to rowMaxData by the group's threads
   if (0 == (groupIndex & 1)) { // is groupIndex even?
      if (rowMaxData[groupIndex + 1].lMax > rowMaxData[groupIndex].lMax) {
         rowMaxData[groupIndex] = rowMaxData[groupIndex + 1];
      }
   }
   GroupMemoryBarrierWithGroupSync(); 
   if (0 == (groupIndex & 3)) { // is groupIndex divisible by 4?
      if (rowMaxData[groupIndex + 2].lMax > rowMaxData[groupIndex].lMax) {
         rowMaxData[groupIndex] = rowMaxData[groupIndex + 2];
      }
   }
   GroupMemoryBarrierWithGroupSync(); 
   if (0 == (groupIndex & 7)) { // is groupIndex divisible by 8?
      if (rowMaxData[groupIndex + 4].lMax > rowMaxData[groupIndex].lMax) {
         rowMaxData[groupIndex] = rowMaxData[groupIndex + 4];
      }
   }
   GroupMemoryBarrierWithGroupSync();
   if (0 == (groupIndex & 15)) { // is groupIndex divisible by 16?
      if (rowMaxData[groupIndex + 8].lMax > rowMaxData[groupIndex].lMax) {
         rowMaxData[groupIndex] = rowMaxData[groupIndex + 8];
      }
   }
   GroupMemoryBarrierWithGroupSync(); 
   if (0 == (groupIndex & 31)) { // is groupIndex divisible by 32?
      if (rowMaxData[groupIndex + 16].lMax > rowMaxData[groupIndex].lMax) {
         rowMaxData[groupIndex] = rowMaxData[groupIndex + 16];
      }
   }
   GroupMemoryBarrierWithGroupSync();
   if (0 == (groupIndex & 63)) { // is groupIndex divisible by 64?
      if (rowMaxData[groupIndex + 32].lMax > rowMaxData[groupIndex].lMax) {
         rowMaxData[groupIndex] = rowMaxData[groupIndex + 32];
      }
      GroupMaxBuffer[groupID.x] = rowMaxData[groupIndex];
         // copy maximum of group to buffer
   }
}

請注意，程式碼使用按位與運算子 & 與 2 的冪減 1 進行測試，以判斷 groupIndex 是否可被 2 的冪整除。我們也可以使用模運算子 % 與 2 的冪代替。

呼叫計算著色器

呼叫計算著色器的 C# 指令碼相對簡單

using UnityEngine;

public class maximumScript : MonoBehaviour 
{
   public ComputeShader shader;
   public Texture2D inputTexture;

   public uint[] groupMaxData;
   public int groupMax;

   private ComputeBuffer groupMaxBuffer;
   
   private int handleMaximumMain;

   void Start () 
   {
      if (null == shader || null == inputTexture) 
      {
         Debug.Log("Shader or input texture missing.");
         return;
      }

      handleMaximumMain = shader.FindKernel("MaximumMain");
      groupMaxBuffer = new ComputeBuffer((inputTexture.height + 63) / 64, sizeof(uint) * 3);
      groupMaxData = new uint[((inputTexture.height + 63) / 64) * 3];

      if (handleMaximumMain < 0 || null == groupMaxBuffer || null == groupMaxData) 
      {
         Debug.Log("Initialization failed.");
         return;
      }
      
      shader.SetTexture(handleMaximumMain, "InputTexture", inputTexture);
      shader.SetInt("InputTextureWidth", inputTexture.width);
      shader.SetBuffer(handleMaximumMain, "GroupMaxBuffer", groupMaxBuffer);
   }

   void OnDestroy() 
   {
      if (null != groupMaxBuffer) 
      {
         groupMaxBuffer.Release();
      }
   }

   void Update()
   {
      shader.Dispatch(handleMaximumMain, (inputTexture.height + 63) / 64, 1, 1);
         // divided by 64 in x because of [numthreads(64,1,1)] in the compute shader code
         // added 63 to make sure that there is a group for all rows
      
      // get maxima of groups
      groupMaxBuffer.GetData(groupMaxData);
      
      // find maximum of all groups
      groupMax = 0;
      for (int group = 1; group < (inputTexture.height + 63) / 64; group++) 
      {
         if (groupMaxData[3 * group + 2] > groupMaxData[3 * groupMax + 2]) 
         {
            groupMax = group;
         }
      }
   }
}

該指令碼具有用於計算著色器和輸入紋理影像的公共變數，您需要設定這些變數。它在陣列 uint[] groupMaxData 中返回其結果，位置由 groupMax 確定。

計算著色器的 RWStructuredBuffer 對應於計算緩衝區 groupMaxBuffer。請注意，這是一個包含 3 個無符號 int 元素的陣列。陣列 groupMaxData 具有相同的記憶體佈局，但由無符號 int 組成；因此，它包含的元素是 groupMaxBuffer 的三倍。

Start() 函式進行一些錯誤檢查，找到計算著色器函式的控制代碼，建立 groupMaxBuffer 和 groupMaxData，並設定計算著色器的統一變數。

OnDestroy() 函式釋放了計算緩衝區，因為它不會被垃圾收集器釋放。

Update() 函式只調用計算著色器函式，其中執行緒組的數量由影像的行數（即高度）除以一個執行緒組中的執行緒數（在本例中為 64）確定。我們在除法之前在行數上加 63，以確保對於不可被 64 整除的影像高度，我們有足夠的執行緒組。

groupMaxBuffer.GetData(groupMaxData) 將資料從計算緩衝區複製到 groupMaxData 陣列。然後程式碼透過迴圈遍歷所有組來找到該陣列中最亮的畫素。請注意，索引為 group 的組的相對亮度位於 groupMaxData[3 * group + 2] 處，因為 groupMaxData 是一個“扁平化”的無符號整數陣列，而不是一個包含 3 個無符號整數的結構的陣列。

最後，最亮畫素的相對亮度位於 groupMaxData[3 * groupMax + 2] 處。它的 x 座標位於 groupMaxData[3 * groupMax + 0] 處，它的 y 座標位於 groupMaxData[3 * groupMax + 1] 處。

總結

您已經完成了本教程！您所學到的一些內容是

如何對影像的所有畫素進行並行搜尋。
如何線上程組中同步執行緒的執行。
如何線上程組中的執行緒之間通訊資料。
如何使用減少操作來加速“groupshared”陣列中的搜尋。

進一步閱讀

如果您仍然想知道更多

有關 Unity 中的計算著色器的資訊，請參見 “計算影像效果”部分。
有關 Unity 中的計算緩衝區的資訊，請參見 Unity 文件中的描述。
有關 HLSL 中的 groupshared 變數的資訊，請參見 Microsoft 開發者網路中的變數語法中的描述。
有關 HLSL 中的 GroupMemoryBarrierWithGroupSync() 和其他內在函式的資訊，請參見 Microsoft 開發者網路中的內在函式中的描述。

< Cg 程式設計/Unity

除非另有說明，否則本頁上的所有示例原始碼均授予公有領域。