並行程式設計簡介

OpenMP 和 MPI 的概述

為了快速解決大型計算問題，有必要利用 CPU（中央處理器）上的多個核心和多個 CPU。迄今為止編寫的大多數程式都是順序的，編譯器通常不會自動生成並行可執行檔案，因此程式設計師需要修改原始的序列計算機程式碼以利用額外的處理能力。兩個指定允許並行程式設計的庫應該做什麼的標準是 OpenMP 和 MPI（訊息傳遞介面）。在本節中，我們將介紹理解、執行和修改本教程中的程式所需的最低限度資訊。可以在 https://computing.llnl.gov/tutorials/ 和 http://www.citutor.org 中找到更詳細的教程。

OpenMP 用於共享記憶體架構上的並行程式設計——每個計算程序對記憶體都有全域性檢視。它允許透過在原始程式碼中新增指令來逐步並行化現有的 Fortran、C 或 C++ 程式碼。因此它易於使用。但是，在使用 OpenMP 時需要謹慎才能獲得良好的效能。在序列程式碼中新增指令很容易，但需要仔細考慮才能建立一個程式，該程式在並行執行時會顯示效能改進並給出正確的結果。對於在規則網格上對多維偏微分方程進行數值求解，很容易執行有效且有效的基於迴圈的並行化，因此不需要完全理解 OpenMP 的所有功能。OpenMP 通常允許使用 10 個計算核心，特別是允許利用多核筆記型電腦、桌上型電腦和工作站。

MPI 用於分散式記憶體架構上的並行程式設計——當單獨的計算程序可以訪問自己的本地記憶體時，並且程序必須顯式地接收屬於已傳送資料的其他程序的記憶體中的資料。MPI 是一個庫，它允許透過新增顯式地將資料從一個程序移動到另一個程序的函式呼叫來並行化 Fortran、C 和 C++ 程式。將序列程式轉換為並行 MPI 程式需要仔細考慮，因為資料需要分解到不同的程序上，因此通常很難逐步並行化使用 MPI 的程式。並行化將使用 MPI 的程式的最佳方法取決於問題。在解決大型問題時，通常每個程序都沒有足夠的記憶體來簡單地複製所有資料。因此，人們希望以儘可能減少執行正確計算所需的郵件傳遞量的方式將資料拆分（稱為域分解）。程式設計這可能相當複雜且耗時。幸運的是，透過使用 2DECOMP&FFT 庫^[1]（它是建立在 MPI 之上的），我們可以避免在編寫傅立葉譜程式碼時不得不編寫許多資料傳遞操作，同時仍然可以從能夠在最多 O( $10^{5}$ ) 個處理器核心上求解偏微分方程中受益。

OpenMP

請閱讀 https://computing.llnl.gov/tutorials/openMP/ 中的教程，然後回答以下問題

OpenMP 練習

什麼是 OpenMP？
從 www.openmp.org 下載最新 OpenMP 規範的副本。最新規範的版本號是多少？
解釋以下每個 OpenMP 指令的作用
1. !$OMP PARALLEL
2. !$OMP END PARALLEL
3. !$OMP PARALLEL DO
4. !$OMP END PARALLEL DO
5. !$OMP BARRIER
6. !$OMP MASTER
7. !$OMP END MASTER

嘗試理解並執行列表 A 中的“Hello World”程式，使用 1、2、6 和 12 個執行緒。將每次執行的輸出放在您的解決方案中，輸出將位於一個以以下格式命名的檔案中
helloworld.o**********
其中上面的最後幾個條目是與執行次數相對應的數字。在列表 B 中，有一個在 Flux 上編譯此程式的示例 Makefile。在列表 C 中有一個示例提交指令碼。要將程式將執行的 OpenMP 程序數量從 2 改為 6，請更改
ppn=2
為
ppn=6
並還將 OMP_NUM_THREADS 變數的值從
{OMP_NUM_THREADS=2}
為
{OMP_NUM_THREADS=6}
在 Flux 上，每個節點最多有 12 個核心，因此對於大多數應用程式來說，最有效的使用執行緒數量是 12。

( A)

來自 http://en.wikipedia.org/wiki/OpenMP 的一個 Fortran 程式，它演示了使用 OpenMP 的並行性程式碼下載

!--------------------------------------------------------------------
!
!
! PURPOSE
!
! This program uses OpenMP to print hello world from all available
! threads
!
! .. Parameters ..
!
! .. Scalars ..
! id = thread id
! nthreads = total number of threads
!
! .. Arrays ..
!
! .. Vectors ..
!
! REFERENCES
! http:// en.wikipedia.org/wiki/OpenMP
!
! ACKNOWLEDGEMENTS
! The program below was modified from one available at the internet
! address in the references. This internet address was last checked
! on 30 December 2011
!
! ACCURACY
!
! ERROR INDICATORS AND WARNINGS
!
! FURTHER COMMENTS
!
!--------------------------------------------------------------------
! External routines required
!
! External libraries required
! OpenMP library
PROGRAM hello90
USE omp_lib
IMPLICIT NONE
INTEGER:: id, nthreads
!$OMP PARALLEL PRIVATE(id)
id = omp_get_thread_num()
nthreads = omp_get_num_threads()
PRINT *, 'Hello World from thread', id
!$OMP BARRIER
IF ( id == 0 ) THEN
PRINT*, 'There are', nthreads, 'threads'
END IF
!$OMP END PARALLEL
END PROGRAM

( B)

一個用於編譯列表 A 中的 helloworld 程式的示例 Makefile 程式碼下載

#define the complier
COMPILER = ifort
# compilation settings, optimization, precision, parallelization
FLAGS = -O0 -openmp

# libraries
LIBS =
# source list for main program
SOURCES = helloworld.f90

test: $(SOURCES)
${COMPILER} -o helloworld $(FLAGS) $(SOURCES)

clean:
rm *.o

clobber:
rm helloworld

( C)

在 Flux 上使用的示例提交指令碼程式碼下載

#!/bin/bash
#PBS -N helloworld
#PBS -l nodes=1:ppn=2,walltime=00:02:00
#PBS -q flux
#PBS -l qos=math471f11_flux
#PBS -A math471f11_flux
#PBS -M your_uniqname@umich.edu
#PBS -m abe
#PBS -V
#
# Create a local directory to run and copy your files to local.
# Let PBS handle your output
mkdir /tmp/${PBS_JOBID}
cp ${HOME}/ParallelMethods/helloworldOMP/helloworld /tmp/${PBS_JOBID}/helloworld
cd /tmp/${PBS_JOBID}

export OMP_NUM_THREADS=2
./helloworld

#Clean up your files
cd ${HOME}/ParallelMethods/helloworldOMP
/bin/rm -rf /tmp/${PBS_JOBID}

在二維熱方程求解器的迴圈中新增 OpenMP 指令。使用 1、3、6 和 12 個執行緒執行生成的程式，並記錄程式完成所需的時間。繪製最後一次迭代的圖。

MPI

可以在 http://www.mpi-forum.org/ 找到當前 MPI 標準的副本。它允許並行化 Fortran、C 和 C++ 程式。有更新的並行程式語言，例如 Co-Array Fortran (CAF) 和 Unified Parallel C (UPC)，它們允許程式設計師將記憶體視為單個可定址空間，即使是在分散式記憶體機器上也是如此。但是，計算機硬體限制意味著編寫 MPI 程式時使用的許多程式設計概念將需要編寫 CAF 和 UPC 程式。這些語言的編譯器技術還沒有像 Fortran 和 C 等舊語言的編譯器技術那樣發達，因此目前 Fortran 和 C 在高效能計算領域佔據主導地位。可以在 http://www.shodor.org/refdesk/Resources/Tutorials/ 中找到編寫和使用 MPI 程式所需的基本概念的介紹。可以在 Gropp、Lusk 和 Skjellum^[2]、Gropp、Lusk 和 Thakur^[3] 以及 https://computing.llnl.gov/tutorials/mpi/ 中找到有關 MPI 的更多資訊。在線上有很多資源可用，但是一旦掌握了基本概念，最有用的是 MPI 命令索引，通常搜尋引擎會提供列表來源，但我們發現以下網站很有用

MPI 練習

MPI 代表什麼？
請閱讀以下網址的教程：http://www.shodor.org/refdesk/Resources/Tutorials/BasicMPI/ 和 https://computing.llnl.gov/tutorials/mpi/，然後解釋以下命令的作用：
- USE mpi或INCLUDE 'mpif.h'
- MPI_INIT
- MPI_COMM_SIZE
- MPI_COMM_RANK
- MPI_FINALIZE
當前 MPI 標準的版本號是多少？
嘗試理解列表 D 中的 Hello World 程式。解釋它與 A 的區別。在 1、2、6、12 和 24 個 MPI 程序上執行列表 D 中的程式^[4]。將每次執行的輸出放在你的解決方案中，輸出將儲存在一個名為
helloworld.o**********
的檔案中，其中上面的最後幾個條目是對應於執行次數的數字。在 Flux 上編譯它的示例 makefile 在列表 E 中。示例提交指令碼在列表 F 中。要更改程式執行的 MPI 程序數量，例如從 2 更改為 6，請更改
ppn=2
為
ppn=6
並更改提交指令碼：
mpirun -np 2 ./helloworld
為
mpirun -np 6 ./helloworld.

在 Flux 上，每個節點最多有 12 個核心，因此如果需要超過 12 個 MPI 程序，則還需要更改節點數量。所需的總核心數量等於節點數量乘以每個節點上的程序數量。因此，要使用 24 個程序，請更改
nodes=1:ppn=2
為
nodes=2:ppn=12
並更改提交指令碼：
mpirun -np 2 ./helloworld
為
mpirun -np 24 ./helloworld.

( D)

一個使用 MPI 展示並行的 Fortran 程式程式碼下載

!--------------------------------------------------------------------
!
!
! PURPOSE
!
! This program uses MPI to print hello world from all available
! processes
!
! .. Parameters ..
!
! .. Scalars ..
! myid = process id
! numprocs = total number of MPI processes
! ierr = error code
!
! .. Arrays ..
!
! .. Vectors ..
!
! REFERENCES
! http:// en.wikipedia.org/wiki/OpenMP
!
! ACKNOWLEDGEMENTS
! The program below was modified from one available at the internet
! address in the references. This internet address was last checked
! on 30 December 2011
!
! ACCURACY
!
! ERROR INDICATORS AND WARNINGS
!
! FURTHER COMMENTS
!
!--------------------------------------------------------------------
! External routines required
!
! External libraries required
! MPI library
PROGRAM hello90
USE MPI
IMPLICIT NONE
INTEGER(kind=4) :: myid, numprocs, ierr

CALL MPI_INIT(ierr)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)

PRINT*, 'Hello World from process', myid
CALL MPI_BARRIER(MPI_COMM_WORLD,ierr)
IF ( myid == 0 ) THEN
PRINT*, 'There are ', numprocs, ' MPI processes'
END IF
CALL MPI_FINALIZE(ierr)	
END PROGRAM

( E)

一個用於編譯列表 {lst:ForMpiHw} 中的 helloworld 程式的示例 makefile 程式碼下載

#define the complier
COMPILER = mpif90
# compilation settings, optimization, precision, parallelization
FLAGS = -O0

# libraries
LIBS =
# source list for main program
SOURCES = helloworld.f90

test: $(SOURCES)
${COMPILER} -o helloworld $(FLAGS) $(SOURCES)

clean:
rm *.o

clobber:
rm helloworld

( F)

一個在 Flux 上使用的示例提交指令碼程式碼下載

#!/bin/bash
#PBS -N helloworld
#PBS -l nodes=1:ppn=2,walltime=00:02:00
#PBS -q flux
#PBS -l qos=math471f11_flux
#PBS -A math471f11_flux
#PBS -M your_uniqname@umich.edu
#PBS -m abe
#PBS -V
#
# Create a local directory to run and copy your files to local.
# Let PBS handle your output
mkdir /tmp/${PBS_JOBID}
cp ${HOME}/ParallelMethods/helloworldMPI/helloworld /tmp/${PBS_JOBID}/helloworld
cd /tmp/${PBS_JOBID}

mpirun -np 2 ./helloworld

#Clean up your files
cd ${HOME}/ParallelMethods/helloworldMPI
/bin/rm -rf /tmp/${PBS_JOBID}

第一個並行程式：蒙特卡羅積分

為了在一個比 {Hello World} 稍微複雜一點的環境中介紹並行程式設計的基礎知識，我們將考慮蒙特卡羅積分。我們將回顧機率和黎曼積分中的重要概念，然後給出示例演算法，並解釋為什麼並行化可能會有所幫助。

機率

$f:U\subset \mathbb {R} ^{2}\rightarrow \mathbb {R} _{+}$ 是一個機率密度函式，如果 $\int \int _{U}f\mathrm {d} A=1$

如果 $f$ 是一個機率密度函式，它取集合 $U\subset \mathbb {R} ^{2}$ ，那麼集合 $W\subset U$ 中事件發生的機率是 $P(W)=\int \int _{W}f\mathrm {d} A.$

明天下雪 $x$ 英寸，凱利明天在彩票中贏得 $y$ 美元的聯合密度由 $f={\frac {c}{(1+x)(100+y)}}$ 給出，其中 $x,y\in [0,100]\times [0,100]$ ，否則 $f=0$ 。求 $c$ 。

假設 $X$ 是一個具有機率密度函式 $f_{1}(x)$ 的隨機變數，而 $Y$ 是一個具有機率密度函式 $f_{2}(y)$ 的隨機變數。那麼 $X$ 和 $Y$ 是 **獨立隨機變數**，如果它們的聯合密度函式是 $f(x,y)=f_{1}(x)f_{2}(y).$

明天是否會下雪以及凱莉明天是否會贏得彩票是獨立隨機變數。

如果 $f(x,y)$ 是隨機變數 $X$ 和 $Y$ 的機率密度函式，**X 均值** 是 $\mu _{1}={\bar {X}}=\int \int xf\mathrm {d} A$ ，而 **Y 均值** 是 $\mu _{2}={\bar {Y}}=\int \int yf\mathrm {d} A.$

X 均值和 Y 均值分別是 X 和 Y 的期望值。

如果 $f(x,y)$ 是隨機變數 $X$ 和 $Y$ 的機率密度函式，那麼 **X 方差** 為 $\sigma _{1}^{2}={\overline {(X-{\bar {X}})^{2}}}=\int \int (x-{\bar {X}})^{2}f\mathrm {d} A$ ，**Y 方差** 為 $\sigma _{2}^{2}={\overline {(Y-{\bar {Y}})^{2}}}=\int \int (y-{\bar {Y}})^{2}f\mathrm {d} A.$

標準差定義為方差的平方根。

求雪量超過預期雪量的 1.1 倍，並且 Kelly 中獎金額超過預期金額的 1.2 倍的機率表示式。

練習

一個班級採用曲線評分方式。假設該班級是總體的一個代表性樣本，數值分數 $x$ 的機率密度函式由 $f(x)=C\exp \left(-{\frac {(x-\mu )^{2}}{2\sigma ^{2}}}\right).$ 為了簡化，我們假設 $x$ 可以取 $-\infty$ 和 $\infty$ ，儘管事實上考試分數在 $0$ 到 $100$ 之間。
1. 使用之前作業的結果確定 $C$ 。
2. 假設這個班級有 240 名學生，並且沒有公佈班級平均分和標準差。作為一個有進取心的學生，你對 60 名同學（假設他們是被隨機選擇的）進行了調查。你發現這 60 名學生的平均分為 55%，標準差為 10%。使用學生 t 分佈 http://en.wikipedia.org/wiki/Student\%27s_t-distribution 估計樣本平均值的 90% 置信區間。在 t 分佈機率密度函式上繪製圖形，並用陰影標記樣本平均值 90% 置信區間所對應的區域。^[5]
**備註** 幸運的是，所有學生都非常勤奮，因此雖然負分數是可能的，但其發生的可能性非常低，因此為了簡化上述計算，我們忽略了這種情況。

黎曼積分

回顧一下，我們可以用黎曼和來近似積分。很多積分無法用解析方法求解，但需要數值解。在本節中，我們將探索一種在計算機上進行此操作的簡單方法。假設我們要找到 $I2d=\int _{0}^{1}\int _{0}^{4}x^{2}+2y^{2}\mathrm {d} y\mathrm {d} x.$ 如果我們用解析方法求解，我們會得到 $I2d=44.$ 假設我們忘記了如何積分，所以我們用數值方法進行。我們可以使用以下 Matlab 程式碼進行操作

( G)

一個用黎曼和來近似積分的 Matlab 程式程式碼下載

% A program to approximate an integral

clear all; format compact; format short;

nx=1000; % number of points in x
xend=1; % last discretization point
xstart=0; % first discretization point
dx=(xend-xstart)/(nx-1); % size of each x sub-interval

ny=4000; % number of points in y
yend=4; % last discretization point
ystart=0; % first discretization point
dy=(yend-ystart)/(ny-1); % size of each y sub-interval

% create vectors with points for x and y
for i=1:nx
    x(i)=xstart+(i-1)*dx;
end
for j=1:ny
    y(j)=ystart+(j-1)*dy;
end

% Approximate the integral by a sum
I2d=0;
for i=1:nx
    for j=1:ny
        I2d=I2d+(x(i)^2+2*y(j)^2)*dy*dx;
    end
end
% print out final answer
I2d

我們可以在三維空間中做類似的事情。假設我們要計算 $I3d=\int _{0}^{1}\int _{0}^{1}\int _{0}^{4}x^{2}+2y^{2}+3z^{2}\mathrm {d} z\mathrm {d} y\mathrm {d} x.$ 用解析方法求解，我們會得到 $I3d=68$

練習

修改 Matlab 程式碼以執行三維積分。
嘗試確定二維或三維方法的精度如何隨著子區間數量的變化而變化。

蒙特卡羅積分

可以將上述積分方案擴充套件到更高維度的積分。這可能會變得計算密集，因此經常使用基於機率的另一種積分方法。我們將討論的方法稱為蒙特卡羅方法，下面的方法描述基於 Michael Corral 的《向量微積分》第 3 章^[6]，可在 http://www.mecmath.net/ 找到，其中可以找到用於進行蒙特卡羅積分的 Java 和 Sage 程式。蒙特卡羅積分背後的想法基於函式平均值的概念，這在單變數微積分中遇到過。回顧一下，對於連續函式 $f(x)$ ，平均值 ${\bar {f}}$ 在區間 $\lbrack a,b\rbrack$ 上定義為

{\bar {f}}~=~{\frac {1}{b-a}}\int _{a}^{b}f(x)\,dx~.

( 1)

數量 $b-a$ 是區間 $\lbrack a,b\rbrack$ 的長度，可以看作是該區間的“體積”。將相同的推理應用於二元或三元函式，我們定義 $f(x,y)$ 在區域 $R$ 上的**平均值**為

{\bar {f}}~=~{\frac {1}{A(R)}}\iint \limits _{R}f(x,y)\,dA~,

( 2)

其中 $A(R)$ 是區域 $R$ 的面積，我們定義 $f(x,y,z)$ 在立體 $S$ 上的**平均值**為

{\bar {f}}~=~{\frac {1}{V(S)}}\iiint \limits _{S}f(x,y,z)\,dV~,

( 3)

其中 $V(S)$ 是立體 $S$ 的體積。因此，例如，我們有

\iint \limits _{R}f(x,y)\,dA~=~A(R){\bar {f}}~.

( 4)

區域 $R$ 上 $f(x,y)$ 的平均值可以看作是 $f$ 所有值之和除以 $R$ 中的點數。不幸的是，任何區域中都有無限多個點（實際上是不可數的），也就是說，它們不能在離散序列中列出。但是，如果我們取 $R$ 區域中非常大量的 $N$ 個隨機點（可以透過計算機生成），然後計算這些點的 $f$ 值的平均值，並使用該平均值作為 ${\bar {f}}$ 的值？這正是蒙特卡洛方法所做的。因此，在公式 4 中，我們得到的近似值是

{\displaystyle \iint \limits _{R}f(x,y)\,dA~\approx ~A(R){\bar {f}}\pm A(R){\sqrt {\frac {{\overline {f^{2}}}-({\bar {f}})^{2}}{N}}}~,}">

( 5)

其中

{\bar {f}}~=~{\frac {\sum _{i=1}^{N}f(x_{i},y_{i})}{N}}\quad {\text{and}}\quad {\overline {f^{2}}}~=~{\frac {\sum _{i=1}^{N}(f(x_{i},y_{i}))^{2}}{N}}~,

( 6)

其中求和是在 $N$ 個隨機點 $(x_{1},y_{1})$ , $\ldots$ , $(x_{N},y_{N})$ 上進行。公式 5 中的 $\pm$ “誤差項”實際上並沒有給出近似值的嚴格界限。它表示積分的預期值的單個標準差。也就是說，它給出了一個可能的誤差界限。由於它使用隨機點，蒙特卡羅方法是一個機率方法的例子（與確定性方法如黎曼和近似方法相反，後者使用特定公式生成點）。

例如，我們可以使用等式 5 中的公式來近似曲面 $z=x^{2}+2y^{2}$ 在矩形 $R=(0,1)\times (0,4)$ 上的體積 $V$ 。回想一下，實際體積是 $44$ 。下面是使用蒙特卡羅積分計算體積的 Matlab 程式碼

( H)

一個 Matlab 程式，演示瞭如何使用蒙特卡羅方法來計算

z=x^{2}+2y^{2}

下方的體積，其中

(x,y)\in (0,1)\times (0,4)

。程式碼下載

% A program to approximate an integral using the Monte Carlos method

% This program can be made much faster by using Matlab's matrix and vector
% operations, however to allow easy translation to other languages we have
% made it as simple as possible.

Numpoints=65536; % number of random points

I2d=0; % Initialize value
I2dsquare=0; % initial variance
for n=1:Numpoints
    % generate random number drawn from a uniform distribution on (0,1)
    x=rand(1);
    y=rand(1)*4;
    I2d=I2d+x^2+2*y^2;
    I2dsquare=I2dsquare+(x^2+2*y^2)^2;
end
% we scale the integral by the total area and divide by the number of
% points used
I2d=I2d*4/Numpoints
% we also output an estimated error
I2dsquare=I2dsquare*4/Numpoints;
EstimError=4*sqrt( (I2d^2-I2dsquare)/Numpoints)

下面顯示了使用不同數量的隨機點執行此程式的結果

N = 16: 41.3026 +/- 30.9791
N = 256: 47.1855 +/- 9.0386
N = 4096: 43.4527 +/- 2.0280
N = 65536: 44.0026 +/- 0.5151

正如你所看到的，近似值相當好。當 $N\to \infty$ 時，可以證明蒙特卡羅近似收斂於實際體積（按 $O({\sqrt {N}})$ 的數量級，在計算複雜度術語中）。

在上面的示例中，區域 $R$ 是一個矩形。要對非矩形（有界）區域 $R$ 使用蒙特卡羅方法，只需要稍作修改。選擇一個包含 $R$ 的矩形 ${\tilde {R}}$ ，並像以前一樣在該矩形中生成隨機點。然後僅當這些點位於 $R$ 內時，才將它們用於 ${\bar {f}}$ 的計算。在這種情況下，無需為公式 ({eqn:monte}) 計算 $R$ 的面積，因為排除 $R$ 外的點允許你使用矩形 ${\tilde {R}}$ 的面積，與之前類似。

例如，可以證明，在非矩形區域 $R=\left\{(x,y):0\leq x^{2}+y^{2}\leq 1\right\}$ 上，曲面 $z=1$ 下的體積是 $\pi$ 。由於矩形 ${\tilde {R}}=[-1,1]\times [-1,1]$ 包含 $R$ ，我們可以使用與我們之前使用的類似程式，最大的變化是檢查 $y^{2}+x^{2}\leq 1$ 是否對 $[-1,1]\times [-1,1]$ 中的隨機點 $(x,y)$ 成立。下面列出了一個展示此功能的 Matlab 程式碼。

( I)

一個演示如何使用蒙特卡羅方法計算不規則區域的面積，以及計算

\pi

的 Matlab 程式。程式碼下載

% A program to approximate an integral using the Monte Carlos method

% This program can be made much faster by using Matlab's matrix and vector
% operations, however to allow easy translation to other languages we have
% made it as simple as possible.

Numpoints=256; % number of random points

I2d=0; % Initialize value
I2dsquare=0; % initial variance
for n=1:Numpoints
    % generate random number drawn from a uniform distribution on (0,1) and
    % scale this to (-1,1)
    x=2*rand(1)-1;
    y=2*rand(1) -1;
    if ((x^2+y^2) <1)
        I2d=I2d+1;
        I2dsquare=I2dsquare+1;
    end
end
% We scale the integral by the total area and divide by the number of
% points used
I2d=I2d*4/Numpoints
% we also output an estimated error
I2dsquare=I2dsquare*4/Numpoints;
EstimError=4*sqrt( (I2d^2-I2dsquare)/Numpoints)

使用不同數量的隨機點執行該程式的結果如下所示

N = 16: 3.5000 +/- 2.9580
N = 256: 3.2031 +/- 0.6641
N = 4096: 3.1689 +/- 0.1639
N = 65536: 3.1493 +/- 0.0407

為了使用蒙特卡羅方法評估三重積分，你需要在平行六面體中生成隨機三元組 $(x,y,z)$ ，而不是在矩形中生成隨機對 $(x,y)$ ，並在公式 5 中使用平行六面體的體積代替矩形的面積。有關數值積分方法的更詳細討論，請參考高階數學課程。

練習

編寫一個程式，使用蒙特卡羅方法逼近二重積分 $\iint \limits _{R}e^{xy}\,dA$ ，其中 $R=\lbrack 0,1\rbrack \times \lbrack 0,1\rbrack$ 。顯示程式在 $N=10$ ， $100$ ， $1000$ ， $10000$ ， $100000$ 和 $1000000$ 個隨機點。
編寫一個程式，使用蒙特卡羅方法來近似三重積分 $\iiint \limits _{S}e^{xyz}\,dV$ ，其中 $S=\lbrack 0,1\rbrack \times \lbrack 0,1\rbrack \times \lbrack 0,1\rbrack$ 。顯示程式輸出，當 $N=10$ ， $100$ ， $1000$ ， $10000$ ， $100000$ 和 $1000000$ 個隨機點。
使用蒙特卡羅方法來近似半徑為 $1$ 的球體的體積。

並行蒙特卡羅積分

你可能已經注意到，這些演算法很簡單，但要獲得準確的結果可能需要非常多的網格點。因此，在平行計算機上執行這些演算法非常有用。我們將演示對 $\pi$ 進行並行蒙特卡羅計算。在我們這樣做之前，我們需要學習如何使用平行計算機^[7]。

我們現在考察一個用於計算 $\pi$ 的 Fortran 程式。這些程式取自http://chpc.wustl.edu/mpi-fortran.html，在那裡可以找到更詳細的解釋。這些程式的原始來源似乎是 Gropp、Lusk 和 Skjellum^[8]

序列

( I)

一個序列 Fortran 程式，演示瞭如何使用蒙特卡羅方法來計算

\pi

程式碼下載

!--------------------------------------------------------------------
!
!
! PURPOSE
!
! This program use a monte carlo method to calculate pi
!
! .. Parameters ..
! npts = total number of Monte Carlo points
! xmin = lower bound for integration region
! xmax = upper bound for integration region
! .. Scalars ..
! i = loop counter
! f = average value from summation
! sum = total sum
! randnum = random number generated from (0,1) uniform
! distribution
! x = current Monte Carlo location
! .. Arrays ..
!
! .. Vectors ..
!
! REFERENCES
! http://chpc.wustl.edu/mpi-fortran.html
! Gropp, Lusk and Skjellum, "Using MPI" MIT press (1999)
!
! ACKNOWLEDGEMENTS
! The program below was modified from one available at the internet
! address in the references. This internet address was last checked
! on 30 March 2012
!
! ACCURACY
!
! ERROR INDICATORS AND WARNINGS
!
! FURTHER COMMENTS
!
!--------------------------------------------------------------------
! External routines required
!
! External libraries required
! None
  PROGRAM monte_carlo
    IMPLICIT NONE

INTEGER(kind=8), PARAMETER :: npts = 1e10
    REAL(kind=8), PARAMETER :: xmin=0.0d0,xmax=1.0d0
    INTEGER(kind=8) :: i
    REAL(kind=8) :: f,sum, randnum,x

    DO i=1,npts
      CALL random_number(randnum)
      x = (xmax-xmin)*randnum + xmin
      sum = sum + 4.0d0/(1.0d0 + x**2)
    END DO
f = sum/npts
    PRINT*,'PI calculated with ',npts,' points = ',f

    STOP
END

( J)

一個用於編譯清單 I 中程式的示例 makefile 程式碼下載

#define the complier
COMPILER = mpif90
# compilation settings, optimization, precision, parallelization
FLAGS = -O0

# libraries
LIBS =
# source list for main program
SOURCES = montecarloserial.f90

test: $(SOURCES)
${COMPILER} -o montecarloserial $(FLAGS) $(SOURCES)

clean:
rm *.o

clobber:
rm montecarloserial

( K)

一個示例提交指令碼，用於在聖地亞哥超級計算機中心的 Trestles 上使用程式碼下載

#!/bin/bash
# the queue to be used.
#PBS -q shared
# specify your project allocation
#PBS -A mia122
# number of nodes and number of processors per node requested
#PBS -l nodes=1:ppn=1
# requested Wall-clock time.
#PBS -l walltime=00:05:00
# name of the standard out file to be "output-file".
#PBS -o job_output
# name of the job
#PBS -N MCserial
# Email address to send a notification to, change "youremail" appropriately
#PBS -M youremail@umich.edu
# send a notification for job abort, begin and end
#PBS -m abe
#PBS -V
cd $PBS_O_WORKDIR #change to the working directory
mpirun_rsh -np 1 -hostfile $PBS_NODEFILE montecarloserial

並行

( L)

一個並行 Fortran MPI 程式，用於計算

\pi

。程式碼下載

!--------------------------------------------------------------------
!
!
! PURPOSE
!
! This program uses MPI to do a parallel monte carlo calculation of pi
!
! .. Parameters ..
! npts = total number of Monte Carlo points
! xmin = lower bound for integration region
! xmax = upper bound for integration region
! .. Scalars ..
! mynpts = this processes number of Monte Carlo points
! myid = process id
! nprocs = total number of MPI processes
! ierr = error code
! i = loop counter
! f = average value from summation
! sum = total sum
! mysum = sum on this process
! randnum = random number generated from (0,1) uniform
! distribution
! x = current Monte Carlo location
! start = simulation start time
! finish = simulation end time
! .. Arrays ..
!
! .. Vectors ..
!
! REFERENCES
! http://chpc.wustl.edu/mpi-fortran.html
! Gropp, Lusk and Skjellum, "Using MPI" MIT press (1999)
!
! ACKNOWLEDGEMENTS
! The program below was modified from one available at the internet
! address in the references. This internet address was last checked
! on 30 March 2012
!
! ACCURACY
!
! ERROR INDICATORS AND WARNINGS
!
! FURTHER COMMENTS
!
!--------------------------------------------------------------------
! External routines required
!
! External libraries required
! MPI library
    PROGRAM monte_carlo_mpi
    USE MPI
    IMPLICIT NONE

    INTEGER(kind=8), PARAMETER :: npts = 1e10
    REAL(kind=8), PARAMETER :: xmin=0.0d0,xmax=1.0d0
    INTEGER(kind=8) :: mynpts
    INTEGER(kind=4) :: ierr, myid, nprocs
    INTEGER(kind=8) :: i
    REAL(kind=8) :: f,sum,mysum,randnum
    REAL(kind=8) :: x, start, finish
    
    ! Initialize MPI
    CALL MPI_INIT(ierr)
    CALL MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)
    CALL MPI_COMM_SIZE(MPI_COMM_WORLD, nprocs, ierr)
    start=MPI_WTIME()

! Calculate the number of points each MPI process needs to generate
    IF (myid .eq. 0) THEN
mynpts = npts - (nprocs-1)*(npts/nprocs)
    ELSE
mynpts = npts/nprocs
    ENDIF
    
    ! set initial sum to zero
    mysum = 0.0d0
! use loop on local process to generate portion of Monte Carlo integral
    DO i=1,mynpts
      CALL random_number(randnum)
      x = (xmax-xmin)*randnum + xmin
      mysum = mysum + 4.0d0/(1.0d0 + x**2)
    ENDDO

! Do a reduction and sum the results from all processes
    CALL MPI_REDUCE(mysum,sum,1,MPI_DOUBLE_PRECISION,MPI_SUM,&
          0,MPI_COMM_WORLD,ierr)
    finish=MPI_WTIME()

    ! Get one process to output the result and running time
    IF (myid .eq. 0) THEN
f = sum/npts
         PRINT*,'PI calculated with ',npts,' points = ',f
         PRINT*,'Program took ', finish-start, ' for Time stepping'
    ENDIF

CALL MPI_FINALIZE(ierr)

    STOP
END PROGRAM

( M)

一個用於編譯清單 L 中程式的示例 makefile 程式碼下載

#define the complier
COMPILER = mpif90
# compilation settings, optimization, precision, parallelization
FLAGS = -O0

# libraries
LIBS =
# source list for main program
SOURCES = montecarloparallel.f90

test: $(SOURCES)
${COMPILER} -o montecarloparallel $(FLAGS) $(SOURCES)

clean:
rm *.o

clobber:
rm montecarloparallel

( N)

這是一個在聖地亞哥超級計算機中心 Trestles 上使用的示例提交指令碼程式碼下載

#!/bin/bash
# the queue to be used.
#PBS -q normal
# specify your project allocation
#PBS -A mia122
# number of nodes and number of processors per node requested
#PBS -l nodes=1:ppn=32
# requested Wall-clock time.
#PBS -l walltime=00:05:00
# name of the standard out file to be "output-file".
#PBS -o job_output
# name of the job, you may want to change this so it is unique to you
#PBS -N MPI_MCPARALLEL
# Email address to send a notification to, change "youremail" appropriately
#PBS -M youremail@umich.edu
# send a notification for job abort, begin and end
#PBS -m abe
#PBS -V

# change to the job submission directory
cd $PBS_O_WORKDIR
# Run the job
mpirun_rsh -np 32 -hostfile $PBS_NODEFILE montecarloparallel

練習

解釋為什麼使用蒙特卡洛方法來評估 $\int _{0}^{1}{\frac {1}{1+x^{2}}}\mathrm {d} x$ 允許您找到 $\pi$ ，並用您自己的話解釋序列程式和並行程式做了什麼。
找出在 32、64、128、256 和 512 個核心上執行並行蒙特卡洛程式所需的時間。
使用並行蒙特卡洛積分程式評估 $\iint x^{2}+y^{6}+\exp(xy)\cos(y\exp(x))\mathrm {d} A$ 在單位圓上。
使用並行蒙特卡洛積分程式來近似橢圓的體積 ${\frac {x^{2}}{9}}+{\frac {y^{2}}{4}}+{\frac {z^{2}}{1}}=1$ 。使用 OpenMP 或 MPI。
編寫並行程式以找到 4 維球體的體積 $1\geq \sum _{i=1}^{4}x_{i}^{2}.$ 嘗試蒙特卡洛方法和黎曼和方法。使用 OpenMP 或 MPI。

筆記

↑ 2decomp&fft
↑ Gropp, Lusk 和 Skjellum (1999)
↑ Gropp, Lusk 和 Thakur (1999)
↑
可以在超過 24 個程序上執行此程式，但是，輸出變得非常多
↑
學生 t 分佈在許多數值軟體包中實現，例如 Maple、Mathematica、Matlab、R、Sage 等，因此如果您需要使用它來獲得數值結果，則可以使用這些軟體包之一。
↑ Corral (2011)
↑ 今天生產的許多計算機和行動電話都有 2 個或更多個核心，因此可以被認為是並行的，但這裡指的是擁有數百個核心的計算機。
↑ Gropp, Lusk 和 Skjellum (1999)

參考文獻

Corral, M. (2011). 向量微積分. {{cite book}}: Cite has empty unknown parameter: |coauthors= (help)

Gropp, W.; Lusk, E.; Skjellum, A. (1999). 使用 MPI. 麻省理工學院出版社。

Gropp, W.; Lusk, E.; Thakur, R. (1999). 使用 MPI-2. 麻省理工學院出版社。

Li, N. "2decomp&fft". {{cite web}}: Cite has empty unknown parameter: |coauthors= (help)

[1] 2decomp&fft

[2] Gropp, Lusk 和 Skjellum (1999)

[3] Gropp, Lusk 和 Thakur (1999)

[4] 
可以在超過 24 個程序上執行此程式，但是，輸出變得非常多

[5] 
學生 t 分佈在許多數值軟體包中實現，例如 Maple、Mathematica、Matlab、R、Sage 等，因此如果您需要使用它來獲得數值結果，則可以使用這些軟體包之一。

[6] Corral (2011)

[7] 今天生產的許多計算機和行動電話都有 2 個或更多個核心，因此可以被認為是並行的，但這裡指的是擁有數百個核心的計算機。

[8] Gropp, Lusk 和 Skjellum (1999)

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]