MINC/教程/NumericPyminc

Numeric Pyminc，或者說，python 如何讓我愛上 fortran，或者，Matlab！滾出我的地盤！

之前的教程介紹了一些 pyminc 的基礎知識，但留下了為什麼使用 python 和 pyminc 的基本問題。如果 C 和 perl 對 Peter Neelin 來說已經足夠好了，為什麼還要在乎？在我看來，主要原因是你可以使用 python 作為一種程式語言，使用起來很愉快（不像 Matlab，令人作嘔），可以自由分發（Matlab？哼！），同時可以讓你獲得接近原生 C 的速度。而且不止一種方法可以做到這一點。繼續閱讀。

皮質厚度，或獻給皮埃爾-西蒙，拉普拉斯侯爵

我們測量皮質厚度的一種方法是使用拉普拉斯方程在皮質內外之間建立流線。因此，對於下面的數值示例，我將採用該問題的一個核心部分並在 python 可訪問的所有不同數值方法中對其進行求解。簡而言之，給定定義的內外邊界（見上面連結中的 labels.mnc），它會在它們之間建立等勢線。這是透過一種簡單的迭代六點平均風格完成的。作為前瞻，樣本標籤和程式碼都可以在這裡找到

http://www.bic.mni.mcgill.ca/users/jason/pyminc-tutorial/

標準 python

給定邊界，簡單的解決方案如下

# the simplest solver in plain old unfettered python
def singleSolveLaplace(grid, output):
    convergence = 0
    for v0 in range(grid.sizes[0]):
        for v1 in range(grid.sizes[1]):
            for v2 in range(grid.sizes[2]):
                if grid.data[v0,v1,v2] > innerValue and \
                       grid.data[v0,v1,v2] < outerValue:
                    oldvalue = output.data[v0,v1,v2]
                    output.data[v0,v1,v2] = (output.data[v0+1,v1,v2] + \
                                             output.data[v0-1,v1,v2] + \
                                             output.data[v0,v1+1,v2] + \
                                             output.data[v0,v1-1,v2] + \
                                             output.data[v0,v1,v2+1] + \
                                             output.data[v0,v1,v2-1]) / 6.0
                    convergence += (oldvalue - output.data[v0,v1,v2])
    return(convergence)

遍歷每個體素，將所有鄰居加起來併除以 6。就是這樣。在示例網格上，這在我的計算機上大約需要 7 秒，這就是問題所在。python 是一種解釋型語言，在像上面描述的那樣遍歷體素時效率並不高。

向量化的 python

正如任何 Matlab 專家都會告訴你的，一個解決方案是將問題向量化。這在 python 中也可以做到，從而產生以下程式碼

# vectorizing all the operations - much faster, not possible for all
# routines.
def vectorizedLaplace(grid, output):
    output.data[1:-1,1:-1,1:-1] = (output.data[0:-2,1:-1,1:-1] + \
                                   output.data[2:,1:-1,1:-1] + \
                                   output.data[1:-1,0:-2,1:-1] + \
                                   output.data[1:-1,2:,1:-1] + \
                                   output.data[1:-1,1:-1,0:-2] + \
                                   output.data[1:-1,1:-1,2:,]) / 6
    output.data[grid.data ==10] = 10
    output.data[grid.data == 0] = 0
    return(0)

快多了 - 大約 0.4 秒。然而，閱讀起來更難看，而且並非所有問題都適合這種技巧。

Weave 登場

因此，你可以用 C 或 C++ 重寫整個程式碼。但這樣一來，你就得處理用這兩種語言進行編碼的痛苦。那麼為什麼不只是重寫內部迴圈呢？在 python 中很簡單 - 進入一個名為 weave 的很好的擴充套件。這裡，一個 C++ 字串可以嵌入到 python 中。它第一次看到它時，就會動態地編譯它，除非程式碼發生變化，否則它會一直重用編譯後的輸出。

    convergence = 0
    nv0 = sizes[0]
    nv1 = sizes[1]
    nv2 = sizes[2]
    code = r"""
#line 45 "laplacian-python-test.py"
for (int v0=0; v0 < nv0; v0++) {
   for (int v1=0; v1 < nv1; v1++) {
      for (int v2=0; v2 < nv2; v2++) {
         if (grid(v0,v1,v2) > 0 && grid(v0,v1,v2) < 10) {
            double oldvalue = output(v0, v1, v2);
            output(v0,v1,v2) = (output(v0+1,v1,v2) +
                                output(v0-1,v1,v2) +
                                output(v0,v1+1,v2) +
                                output(v0,v1-1,v2) +
                                output(v0,v1,v2+1) +
                                output(v0,v1,v2-1))/6;
            convergence += oldvalue - output(v0,v1,v2);
}}}}
"""
    weave.inline(code, ['grid', 'output', 'convergence', 'nv0', 'nv1', 'nv2'],
                 type_converters = converters.blitz,
                 compiler='gcc')
    return(convergence)

同樣大約需要 0.35 秒。不錯。

do 300, v0=1, nv0 - 重返老式風格

除了 weave 中的 C++ 介面，還有一個很棒的 Fortran 介面，f2py，可以連線到 python。你不會相信，python 讓 Fortran 又變得可愛了。雖然可以使用任何 Fortran 方言，但我堅持使用 F77，純粹是為了自虐。以下 Fortran 程式碼

C FILE: laplace.f
      subroutine laplace (grid, output, nv0, nv1, nv2, convergence)
C     
C     single evaluation of laplace's function
C
      INTEGER grid(nv0, nv1, nv2)
      REAL output(nv0, nv1, nv2)
      double precision convergence
      double precision oldvalue

      INTEGER v0, v1, v2
cf2py intent(in) :: grid, n0, nv1, nv2
cf2py intent(inplace) :: output
cf2py intent(out) :: convergence

      do 300, v0 = 1, nv0
         do 200, v1 = 1, nv1
            do 100, v2 = 1, nv2
               if (output(v0,v1,v2) .lt. 10) then
                  if (output(v0,v1,v2) .gt. 0) then
                     oldvalue = output(v0,v1,v2)
                     
                     output(v0,v1,v2) = (output(v0-1, v1, v2) + 
     +                    output(v0+1,v1,v2) + 
     +                    output(v0,v1-1,v2) + 
     +                    output(v0,v1+1,v2) + 
     +                    output(v0,v1,v2-1) + 
     +                    output(v0,v1,v2+1))/6
                     convergence = convergence + 
     +                    (oldvalue - output(v0,v1,v2))
                  endif
               endif
 100        continue
 200     continue
 300  continue
      end

然後，在命令列中，使用 'f2py -c laplace.f -m laplace' 編譯它，並在 python 中像這樣呼叫它

# in fortran (F77, no less - rocking it old school!)
# actual code is in laplace.f
def flaplaceSolve(g,o):
    convergence = 0
    convergence = laplace.laplace(g,o, grid.sizes[0],
                                  grid.sizes[1], grid.sizes[2])
    return(convergence)

完成！現在是 0.315 秒。注意，如果你檢視完整的原始碼，我不得不新增一個轉置，因為行主陣列和列主陣列之間的差異令人愉快。

漂亮的 Cython

但如果你想編寫乾淨的 python 程式碼而不深入研究 Fortran，該怎麼辦？另一個擴充套件來解救你 - Cython。它是一種 python 方言，允許你新增型別資訊以獲得接近原生速度。程式碼如下

# compile as follows:
# cython cython_laplace.pyx
# gcc -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I/usr/include/python2.5 -o cython_laplace.so cython_laplace.c

# note - there are smarter ways of doing this in distutils.

import numpy as np
cimport numpy as np

# some type definitions - first for the floats
FDTYPE = np.float64
ctypedef np.float64_t FDTYPE_t

# ... and then for the unsigned bytes
BDTYPE = np.uint8
ctypedef np.uint8_t BDTYPE_t

# the function definition - notice the types and dims, which are key for speed.
def cythonLaplaceStep(np.ndarray[BDTYPE_t, ndim=3] g,
                      np.ndarray[FDTYPE_t, ndim=3] o):

    # die a horrible flaming death if inputs had different types
    assert g.dtype == BDTYPE and o.dtype == FDTYPE

    # get the dimension info - again, notice the importance of defining types
    cdef int nv0 = g.shape[0]
    cdef int nv1 = g.shape[1]
    cdef int nv2 = g.shape[2]

    cdef int v0, v1, v2

    cdef double convergence = 0.0
    cdef double oldvalue

    # the actual loop - looks identical to the python code
    for v0 in range(nv0):
        for v1 in range(nv1):
            for v2 in range(nv2):
                if g[v0,v1,v2] > 0 and g[v0,v1,v2] < 10:
                    oldvalue = o[v0,v1,v2]
                    o[v0,v1,v2] = (o[v0+1,v1,v2] +
                                   o[v0-1,v1,v2] +
                                   o[v0,v1+1,v2] +
                                   o[v0,v1-1,v2] +
                                   o[v0,v1,v2+1] +
                                   o[v0,v1,v2-1])/6.0
                    convergence += oldvalue - o[v0,v1,v2]
    return(convergence)

然後，這段程式碼被編譯 - 注意程式碼看起來幾乎與簡單且緩慢的 python 迴圈完全相同，只是添加了型別 - 然後從主 python 程式碼中呼叫，如下所示

# in cython - the actual code is in cython_laplace.pyx
def cythonLaplace(grid, output):
    convergence = cython_laplace.cythonLaplaceStep(asarray(grid.data),
                                                   asarray(output.data))
    return(convergence)

大約需要 0.32 秒。

結論

以上是使用 pyminc 的所有原因 - 許多選擇可以讓你在一種讓人愉快地使用的語言中編寫高效的程式碼。下面是使用的完整程式碼 - 玩得開心

#!/usr/bin/env python

# lots of imports
from pyminc.volumes.factory import *
from numpy import *
import sys
from optparse import OptionParser
from scipy import weave
from scipy.weave import converters

import laplace # the fortran version
import cython_laplace # and the cython version

# the simplest solver in plain old unfettered python
def singleSolveLaplace(grid, output):
    convergence = 0
    for v0 in range(grid.sizes[0]):
        for v1 in range(grid.sizes[1]):
            for v2 in range(grid.sizes[2]):
                if grid.data[v0,v1,v2] > innerValue and \
                       grid.data[v0,v1,v2] < outerValue:
                    oldvalue = output.data[v0,v1,v2]
                    output.data[v0,v1,v2] = (output.data[v0+1,v1,v2] + \
                                             output.data[v0-1,v1,v2] + \
                                             output.data[v0,v1+1,v2] + \
                                             output.data[v0,v1-1,v2] + \
                                             output.data[v0,v1,v2+1] + \
                                             output.data[v0,v1,v2-1]) / 6.0
                    convergence += (oldvalue - output.data[v0,v1,v2])
    return(convergence)

# vectorizing all the operations - much faster, not possible for all
# routines.
def vectorizedLaplace(grid, output):
    output.data[1:-1,1:-1,1:-1] = (output.data[0:-2,1:-1,1:-1] + \
                                   output.data[2:,1:-1,1:-1] + \
                                   output.data[1:-1,0:-2,1:-1] + \
                                   output.data[1:-1,2:,1:-1] + \
                                   output.data[1:-1,1:-1,0:-2] + \
                                   output.data[1:-1,1:-1,2:,]) / 6
    output.data[grid.data ==10] = 10
    output.data[grid.data == 0] = 0
    return(0)

# in cython - the actual code is in cython_laplace.pyx
def cythonLaplace(grid, output):
    convergence = cython_laplace.cythonLaplaceStep(asarray(grid.data),
                                                   asarray(output.data))
    return(convergence)

# in weave - allows you to embed a C++ string in the code
def weaveLaplace(grid, output, sizes):
    convergence = 0
    nv0 = sizes[0]
    nv1 = sizes[1]
    nv2 = sizes[2]
    code = r"""
#line 45 "laplacian-python-test.py"
for (int v0=0; v0 < nv0; v0++) {
   for (int v1=0; v1 < nv1; v1++) {
      for (int v2=0; v2 < nv2; v2++) {
         if (grid(v0,v1,v2) > 0 && grid(v0,v1,v2) < 10) {
            double oldvalue = output(v0, v1, v2);
            output(v0,v1,v2) = (output(v0+1,v1,v2) +
                                output(v0-1,v1,v2) +
                                output(v0,v1+1,v2) +
                                output(v0,v1-1,v2) +
                                output(v0,v1,v2+1) +
                                output(v0,v1,v2-1))/6;
            convergence += oldvalue - output(v0,v1,v2);
}}}}
"""
    weave.inline(code, ['grid', 'output', 'convergence', 'nv0', 'nv1', 'nv2'],
                 type_converters = converters.blitz,
                 compiler='gcc')
    return(convergence)

# in fortran (F77, no less - rocking it old school!)
# actual code is in laplace.f
def flaplaceSolve(g,o):
    convergence = 0
    convergence = laplace.laplace(g,o, grid.sizes[0],
                                  grid.sizes[1], grid.sizes[2])
    return(convergence)

if __name__ == "__main__":

    usage = "%prog input-grid.mnc output-solved.mnc"
    description = "Simple solver of laplace's equation - designed to help teach people how to use numeric python"


    # argument handling - an option chooses which way to do the computation
    parser = OptionParser(usage=usage, description=description)
    parser.add_option("--slow-python", dest="mode", action="store_const",
                      const="slow-python")
    parser.add_option("--fortran", dest="mode", action="store_const",
                      const="fortran")
    parser.add_option("--vectorized-python", dest="mode", action="store_const",
                      const="vector-python")
    parser.add_option("--weave", dest="mode", action="store_const",
                      const="weave")
    parser.add_option("--cython", dest="mode", action="store_const",
                      const="cython")

    (options, args) = parser.parse_args()
    if len(args) != 2:
        parser.error("Incorrect number of arguments")
        
    gridfile = args[0]
    outputfile = args[1]

    # get the volumes
    grid = volumeFromFile(gridfile, dtype='ubyte')
    output = volumeLikeFile(gridfile, outputfile)
    # initialize the output to the boundaries
    output.data[:,:,:] = grid.data[:,:,:]

    innerValue = 0
    outerValue = 10

    convergence = 0

    # transpose for the fortran bits - the joys of column versus row major
    g = asfortranarray(grid.data)
    o = asfortranarray(output.data)

    for i in range(100):
        if options.mode == "slow-python":
            convergence = singleSolveLaplace(grid, output)
        elif options.mode == "fortran":
            convergence = flaplaceSolve(g,o)
        elif options.mode == "vector-python":
            convergence = vectorizedLaplace(grid, output)
        elif options.mode == "weave":
            # notice the cast to array - no speed penalty, but necessary
            # since weave does not understand the hyperslab subclass
            # that pyminc uses
            convergence = weaveLaplace(asarray(grid.data),
                                       asarray(output.data),
                                       grid.sizes)
        elif options.mode == "cython":
            convergence = cythonLaplace(grid, output)
    if options.mode == "fortran":
        output.data = ascontiguousarray(o)

    output.writeFile()
    output.closeVolume()
    grid.closeVolume()