跳轉到內容

並行譜數值方法/使用ParaView協同處理進行視覺化

來自華夏公益教科書

ParaView協同處理外掛允許應用程式程式碼被檢測以連線到ParaView伺服器,以執行視覺化管道。管道可以生成各種格式的影像或VTK XML並行檔案格式資料集。

使用者使用一個示例(可能解析度較低)資料集在ParaView客戶端中建立一個管道。然後,該管道使用協同處理外掛匯出為Python指令碼,該指令碼將由應用程式載入。

應用程式開發人員還需要編寫一些Fortran/C++程式碼,將應用程式的資料結構轉換為ParaView可以理解的格式。對於完全規則的網格,這是很簡單的。

在單個計算節點上使用ParaView協同處理

[編輯 | 編輯原始碼]

ParaView客戶端設定

[編輯 | 編輯原始碼]

第一個要求是擁有一個可以匯出Python指令碼的ParaView客戶端。截至ParaView 3.14.1,這需要從原始碼構建。 標準說明適用於一些關於構建 指令碼生成外掛的額外說明。構建完成後,第一次啟動ParaView時,轉到工具,管理外掛,並將協同處理外掛設定為在啟動時載入。載入外掛後,應該有兩個新的選單項:寫入器和協同處理

客戶端構建說明

[編輯 | 編輯原始碼]
  • 可能需要安裝CMake
  • 可能需要構建Qt
  • 可能需要關閉構建Manta外掛

現有程式碼修改

[編輯 | 編輯原始碼]

以下說明適用於2012年夏季在NCSA的Forge上測試的Navier-Stokes CUDA Fortran。

模擬程式碼

[編輯 | 編輯原始碼]

 

 

 

 

( A)

navierstokes.cuf添加了4行並刪除了.datbin寫入。 程式碼下載

!--------------------------------------------------------------------
!
! PURPOSE
!
! This program numerically solves the 2D incompressible Navier-Stokes
! on a Square Domain [0,1]x[0,1] using pseudo-spectral methods and
! Crank-Nicolson timestepping. The numerical solution is compared to
! the exact Taylor-Green Vortex Solution. 
!
! Periodic free-slip boundary conditions and Initial conditions:
!       u(x,y,0)=sin(2*pi*x)cos(2*pi*y)
!       v(x,y,0)=-cos(2*pi*x)sin(2*pi*y)
! Analytical Solution (subscript denote derivatives):
!       u(x,y,t)=sin(2*pi*x)cos(2*pi*y)exp(-8*pi^2*t/Re)
!       v(x,y,t)=-cos(2*pi*x)sin(2*pi*y)exp(-8*pi^2*t/Re)
!   u_y(x,y,t)=-2*pi*sin(2*pi*x)sin(2*pi*y)exp(-8*pi^2*t/Re)
!       v_x(x,y,t)=2*pi*sin(2*pi*x)sin(2*pi*y)exp(-8*pi^2*t/Re)
!       omega=v_x-u_y
!
! .. Parameters ..
!  Nx                           = number of modes in x - power of 2 for FFT
!  Ny                           = number of modes in y - power of 2 for FFT
!  nplots                       = number of plots produced
!  plotgap                      = number of timesteps inbetween plots
!  Re                           = dimensionless Renold's number
!  ReInv                        = 1/Re for optimization
!  dt                           = timestep size 
!  dtInv                        = 1/dt for optimization
!  tol                          = determines when convergences is reached
!  scalemodes           = 1/(Nx*Ny), scaling after preforming FFTs
!  numthreads           = number of CPUs used in calculation
! .. Scalars ..
!  i                            = loop counter in x direction
!  j                            = loop counter in y direction
!  n                            = loop counter for timesteps direction  
!  allocatestatus       = error indicator during allocation
!  time                         = times at which data is saved
!  chg                          = error at each iteration
!  temp                         = used for ordering saved omega 
! .. Arrays (gpu) ..
!  u_d                          = velocity in x direction
!  uold_d                               = velocity in x direction at previous timestep
!  v_d                          = velocity in y direction
!  vold_d                               = velocity in y direction at previous timestep
!  omeg_d                               = vorticity     in real space
!  omeghat_d                    = 2D Fourier transform of vorticity
!                                               at next iterate
!  omegoldhat_d         = 2D Fourier transform of vorticity at previous
!                                               iterate
!  nloldhat_d                   = nonlinear term in Fourier space
!                                               at previous iterate
!  omegexact_d          = taylor-green vorticity at
!                                               at final step
!  psihat_d                     = 2D Fourier transform of streamfunction
!                                               at next iteration
!  omegcheck_d          = store of vorticity at previous iterate
!  temp1_d/temp2_d              = reusable complex/real space used for
!                                               calculations. This reduces number of
!                                               arrays stored.
! .. Vectors (gpu) ..
!  kx_d                         = fourier frequencies in x direction
!  ky_d                         = fourier frequencies in y direction
!  x_d                          = x locations
!  y_d                          = y locations
!  name_config          = array to store filename for data to be saved                  
! REFERENCES
!
! ACKNOWLEDGEMENTS
!
! ACCURACY
!               
! ERROR INDICATORS AND WARNINGS
!
! FURTHER COMMENTS
! This program has not been fully optimized.
!--------------------------------------------------------------------   
module precision
  ! Precision control
  
  integer, parameter, public :: Single = kind(0.0) ! Single precision
  integer, parameter, public :: Double = kind(0.0d0) ! Double precision
  
  integer, parameter, public :: fp_kind = Double
  !integer, parameter, public :: fp_kind = Single
  
end module precision

module cufft
  
  integer, public :: CUFFT_FORWARD = -1
  integer, public :: CUFFT_INVERSE = 1
  integer, public :: CUFFT_R2C = Z'2a' ! Real to Complex (interleaved)
  integer, public :: CUFFT_C2R = Z'2c' ! Complex (interleaved) to Real
  integer, public :: CUFFT_C2C = Z'29' ! Complex to Complex, interleaved
  integer, public :: CUFFT_D2Z = Z'6a' ! Double to Double-Complex
  integer, public :: CUFFT_Z2D = Z'6c' ! Double-Complex to Double
  integer, public :: CUFFT_Z2Z = Z'69' ! Double-Complex to Double-Complex
  
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  !
  ! cufftPlan2d(cufftHandle *plan, int nx,int ny, cufftType type)
  !
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  
  interface cufftPlan2d
     subroutine cufftPlan2d(plan, nx, ny, type) bind(C,name='cufftPlan2d')
       use iso_c_binding
       integer(c_int):: plan
       integer(c_int),value:: nx, ny, type
     end subroutine cufftPlan2d
  end interface
  
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  !
  ! cufftDestroy(cufftHandle plan)
  !
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  
  interface cufftDestroy
     subroutine cufftDestroy(plan) bind(C,name='cufftDestroy')
       use iso_c_binding
       integer(c_int),value:: plan
     end subroutine cufftDestroy
  end interface
  
  interface cufftExecD2Z
     subroutine cufftExecD2Z(plan, idata, odata) &
          & bind(C,name='cufftExecD2Z')
       use iso_c_binding
       use precision
       integer(c_int),  value  :: plan
       real(fp_kind),   device :: idata(1:nx,1:ny)
       complex(fp_kind),device :: odata(1:nx,1:ny)
     end subroutine cufftExecD2Z
  end interface
  
  interface cufftExecZ2D
     subroutine cufftExecZ2D(plan, idata, odata) &
          & bind(C,name='cufftExecZ2D')
       use iso_c_binding
       use precision
       integer(c_int),value:: plan
       complex(fp_kind),device:: idata(1:nx,1:ny)
       real(fp_kind),device :: odata(1:nx,1:ny)
     end subroutine cufftExecZ2D
  end interface
  
end module cufft

PROGRAM main
  use precision
  use cufft
  
  ! coprocessing:
  use ns2dcnadaptor

  ! declare variables
  IMPLICIT NONE
  INTEGER(kind=4), PARAMETER            ::  Nx=1024
  INTEGER(kind=4), PARAMETER            ::  Ny=1024     
  !INTEGER(kind=8)                                              ::  
  REAL(fp_kind), PARAMETER              ::  dt=0.00025d0
  REAL(fp_kind), PARAMETER              ::  dtInv=1.0d0/REAL(dt,kind(0d0))   
  REAL(fp_kind), PARAMETER      &
       ::  pi=3.14159265358979323846264338327950288419716939937510d0
  REAL(fp_kind), PARAMETER              ::  Re=5000.0d0                 
  REAL(fp_kind), PARAMETER              ::      ReInv=1.0d0/REAL(Re,kind(0d0))
  REAL(fp_kind), PARAMETER              ::      tol=0.1d0**10
  
  !coprocessing: temp is used as an int, runtime crashes on 
  ! floating temp used in write format string on line 376
  !REAL(fp_kind)                                        ::  scalemodes,chg,temp=10000000
  real(fp_kind) :: scalemodes, chg
  integer(kind=4) :: temp=100000000

  INTEGER(kind=4), PARAMETER            ::      nplots=400, plotgap=200                         
  REAL(fp_kind),DIMENSION(:), ALLOCATABLE               ::  x,y
  REAL(fp_kind),DIMENSION(:,:), ALLOCATABLE     ::      omeg,omegexact
  INTEGER(kind=4)                               ::  i,j,n,t, AllocateStatus
  INTEGER(kind=4)                               ::  planz2d,pland2z, kersize
  !variables used for saving data and timing            
  INTEGER(kind=4)                               ::      start, finish, count_rate,count, iol    
  CHARACTER*100                                 ::  name_config
  ! Declare variables for GPU   
  REAL(fp_kind), DEVICE, DIMENSION(:,:), ALLOCATABLE            ::      u_d,v_d,&
       omegcheck_d,omeg_d,&
       nl_d, temp2_d, omegexact_d
  COMPLEX(fp_kind), DEVICE, DIMENSION(:,:), ALLOCATABLE :: omegoldhat_d, nloldhat_d,&
       omeghat_d, nlhat_d, psihat_d,&
       temp1_d                                                          
  COMPLEX(fp_kind), DEVICE, DIMENSION(:), ALLOCATABLE           ::  kx_d,ky_d                                           
  REAL(kind=8),DEVICE, DIMENSION(:), ALLOCATABLE                ::  x_d,y_d
  
  kersize=min(Nx,256)
  PRINT *,'Program starting'
  PRINT *,'Grid:',Nx,'X',Ny
  PRINT *,'dt:',dt      
  ALLOCATE(x(1:Nx),y(1:Ny),omeg(1:Nx,1:Ny),omegexact(1:Nx,1:Ny),&
       stat=AllocateStatus)
  IF (AllocateStatus .ne. 0) STOP               
  PRINT *,'Allocated CPU arrays'
  ALLOCATE(kx_d(1:Nx/2+1),ky_d(1:Ny),x_d(1:Nx),y_d(1:Ny),u_d(1:Nx,1:Ny),&
       v_d(1:Nx,1:Ny),omeg_d(1:Nx,1:Ny),&
       omegexact_d(1:Nx,1:Ny),temp2_d(1:Nx,1:Ny),&
       omegoldhat_d(1:Nx/2+1,1:Ny),nloldhat_d(1:Nx/2+1,1:Ny),&
       omegcheck_d(1:Nx,1:Ny),omeghat_d(1:Nx/2+1,1:Ny),nl_d(1:Nx,1:Ny),&
       nlhat_d(1:Nx/2+1,1:Ny), psihat_d(1:Nx/2+1,1:Ny),temp1_d(1:Nx/2+1,1:Ny),&
       stat=AllocateStatus)
  IF (AllocateStatus .ne. 0) STOP 
  PRINT *,'Allocated GPU arrays'
  
  
  CALL cufftPlan2D(pland2z,nx,ny,CUFFT_D2Z)
  CALL cufftPlan2D(planz2d,nx,ny,CUFFT_Z2D)
  PRINT *,'Setup FFTs'
  
  ! setup fourier frequencies
  !$cuf kernel do <<< *,* >>>
  DO i=1,Nx/2+1
     kx_d(i)= 2.0d0*pi*cmplx(0.0d0,1.0d0)*REAL(i-1,kind=fp_kind)  
  END DO
  kx_d(1+Nx/2)=0.0d0
  !$cuf kernel do <<< *,* >>>
  DO i=1,Nx
     x_d(i)=REAL(i-1,kind(0d0))/REAL(Nx,kind=fp_kind) 
  END DO
  !$cuf kernel do <<< *,* >>>
  DO j=1,Ny/2+1
     ky_d(j)= 2.0d0*pi*cmplx(0.0d0,1.0d0)*REAL(j-1,kind=fp_kind) 
  END DO
  ky_d(1+Ny/2)=0.0d0
  !$cuf kernel do <<< *,* >>>
  DO j = 1,Ny/2 -1
     ky_d(j+1+Ny/2)=-ky_d(1-j+Ny/2)
  END DO
  !$cuf kernel do <<< *, * >>>
  DO j=1,Ny
     y_d(j)=REAL(j-1,kind(0d0))/REAL(Ny,kind=fp_kind) 
  END DO
  scalemodes=1.0d0/REAL(Nx*Ny,kind=fp_kind)
  PRINT *,'Setup grid and fourier frequencies'
  
  !initial data
  !$cuf kernel do <<<  *,*  >>>
  DO j=1,Ny
     DO i=1,Nx
        u_d(i,j)=sin(2.0d0*pi*x_d(i))*cos(2.0d0*pi*y_d(j))
     END DO
  END DO
  !$cuf kernel do <<<  *,*  >>>
  DO j=1,Ny
     DO i=1,Nx
        v_d(i,j)=-cos(2.0d0*pi*x_d(i))*sin(2.0d0*pi*y_d(j))
     END DO
  END DO
  !$cuf kernel do <<<  *,*  >>>
  DO j=1,Ny
     DO i=1,Nx
        omeg_d(i,j)=4.0d0*pi*sin(2.0d0*pi*x_d(i))*sin(2.0d0*pi*y_d(j))+0.01d0*cos(2.0d0*pi*y_d(j))
     END DO
  END DO
  
  CALL cufftExecD2Z(pland2z,omeg_d,omeghat_d) 
  
  !$cuf kernel do <<<  *,*  >>>
  DO j=1,Ny
     DO i=1,Nx/2+1
        temp1_d(i,j)=omeghat_d(i,j)*kx_d(i)
     END DO
  END DO
  
  CALL cufftExecZ2D(planz2d,temp1_d,temp2_d)
  !first part nonlinear term in real space
  !$cuf kernel do <<<  *,*  >>>
  DO j=1,Ny
     DO i=1,Nx
        nl_d(i,j)=u_d(i,j)*temp2_d(i,j)
     END DO
  END DO
  
  !$cuf kernel do <<<  *,*  >>>
  DO j=1,Ny
     DO i=1,Nx/2+1
        temp1_d(i,j)=omeghat_d(i,j)*ky_d(j)
     END DO
  END DO
  
  CALL cufftExecZ2D(planz2d,temp1_d,temp2_d)
  
  !$cuf kernel do <<<  *,*  >>>
  DO j=1,Ny
     DO i=1,Nx
        nl_d(i,j)=(nl_d(i,j)+v_d(i,j)*temp2_d(i,j))*scalemodes
     END DO
  END DO
  
  CALL cufftExecD2Z(pland2z,nl_d,nlhat_d) 
  
  omegcheck_d=omeg_d
  PRINT *,'Got initial data, starting timestepping'     
  CALL system_clock(start,count_rate)

  ! coprocessing: simulation loop
  ! coprocessing: before entering, initialze the ParaView coprocessor
  ! coprocessing: this is from FortranAdaptorAPI.cxx
  ! coprocessing: it takes the name of the pipeline script you created
  ! coprocessing: in the ParaView client and the name's length in chars
  call coprocessorinitialize("pipeline.py", 11 )
  DO t=1,nplots
     DO n=1,plotgap
        chg=1.0d0
        nloldhat_d=nlhat_d
        omegoldhat_d=omeghat_d
        DO WHILE (chg>tol)
           !$cuf kernel do(2) <<< (2,*), (kersize,1) >>>
           DO j=1,Ny
              DO i=1,Nx/2+1
                 omeghat_d(i,j)=((dtInv+0.5d0*ReInv*(kx_d(i)*kx_d(i)+ky_d(j)*ky_d(j)))&
                      *omegoldhat_d(i,j) - 0.5d0*(nloldhat_d(i,j)+nlhat_d(i,j)))&
                      /(dtInv-0.5d0*ReInv*(kx_d(i)*kx_d(i)+ky_d(j)*ky_d(j)))   
              END DO
           END DO
           !$cuf kernel do(2) <<< (2,*), (kersize,1) >>>
           DO j=1,Ny
              DO i=1,Nx/2+1
                 psihat_d(i,j)=-omeghat_d(i,j)/(kx_d(i)*kx_d(i)+ky_d(j)*ky_d(j)+0.10d0**14)
              END DO
           END DO
           CALL cufftExecZ2D(planz2d,omeghat_d,omeg_d)
           !$cuf kernel do(2) <<< (2,*), (kersize,1) >>>
           DO j=1,Ny
              DO i=1,Nx/2+1
                 temp1_d(i,j)=-psihat_d(i,j)*kx_d(i)*scalemodes
              END DO
           END DO
           CALL cufftExecZ2D(planz2d,temp1_d,v_d) 
           !$cuf kernel do(2) <<< (2,*), (kersize,1) >>>                                
           DO j=1,Ny
              DO i=1,Nx/2+1
                 temp1_d(i,j)=psihat_d(i,j)*ky_d(j)*scalemodes
              END DO
           END DO
           CALL cufftExecZ2D(planz2d,temp1_d,u_d)
           
           !$cuf kernel do(2) <<< (2,*), (kersize,1) >>>
           DO j=1,Ny
              DO i=1,Nx/2+1
                 temp1_d(i,j)=omeghat_d(i,j)*kx_d(i)
              END DO
           END DO
           
           CALL cufftExecZ2D(planz2d,temp1_d,temp2_d)
           
           !$cuf kernel do(2) <<< (2,*), (kersize,1) >>>
           DO j=1,Ny
              DO i=1,Nx
                 nl_d(i,j)=u_d(i,j)*temp2_d(i,j)
              END DO
           END DO
           
           !$cuf kernel do(2) <<< (2,*), (kersize,1) >>>
           DO j=1,Ny
              DO i=1,Nx/2+1
                 temp1_d(i,j)=omeghat_d(i,j)*ky_d(j)
              END DO
           END DO
           
           CALL cufftExecZ2D(planz2d,temp1_d,temp2_d)
           
           !$cuf kernel do(2) <<< (2,*), (kersize,1) >>>
           DO j=1,Ny
              DO i=1,Nx
                 nl_d(i,j)=(nl_d(i,j)+v_d(i,j)*temp2_d(i,j))*scalemodes
              END DO
           END DO
           
           CALL cufftExecD2Z(pland2z,nl_d,nlhat_d) 
           
           chg=0.0d0
           !$cuf kernel do(2) <<< (2,*), (kersize,1) >>>
           DO j=1,Ny
              DO i=1,Nx
                 chg=chg+(omeg_d(i,j)-omegcheck_d(i,j))*(omeg_d(i,j)-omegcheck_d(i,j))&
                      *scalemodes*scalemodes
              END DO
           END DO
           omegcheck_d=omeg_d
        END DO
     END DO
     PRINT *, t*plotgap*dt
     omeg=omeg_d
     !temp=temp+1
     !write(name_config,'(a,i0,a)') 'omega',temp,'.datbin'
     !INQUIRE(iolength=iol) omeg(1,1)
     !OPEN(unit=11,FILE=name_config,form="unformatted", access="direct",recl=iol) 
     !count = 1 !coprocessing - there's an intrinsic array function this shadows
     !DO j=1,Ny
     !   DO i=1,Nx
     !      WRITE(11,rec=count) omeg(i,j)*scalemodes
     !      count=count+1
     !   END DO
     !END DO
     !CLOSE(11)

     ! coprocessing: do the coprocessing at the end of the sim loop
     ! grid dims, time step, time, scalar array
     call navierStokesCoprocessor(Nx, Ny, 1, t, t*dt  ,omeg)  
 
  END DO
 
  ! coprocessing: clean up
  call coprocessorfinalize()

 CALL system_clock(finish,count_rate)
  PRINT*,'Program took ',REAL(finish-start)/REAL(count_rate),&
       'for Time stepping'
  
  
  ! Copy results back to host
  x=x_d
  y=y_d         
  
!!!!!!!!!!!!!!!!!!!!!!!!
  !copy over data to disk!
!!!!!!!!!!!!!!!!!!!!!!!!
  
  name_config = 'xcoord.dat' 
  OPEN(unit=11,FILE=name_config,status="UNKNOWN")       
  REWIND(11)
  DO i=1,Nx
     WRITE(11,*) x(i)
  END DO
  CLOSE(11)
  
  name_config = 'ycoord.dat' 
  OPEN(unit=11,FILE=name_config,status="UNKNOWN")       
  REWIND(11)
  DO j=1,Ny
     WRITE(11,*) y(j)
  END DO
  CLOSE(11)
!!!!!!!!!!!!!!!!!!!!!!!!
  
  CALL cufftDestroy(planz2d)
  CALL cufftDestroy(pland2z)
  PRINT *,'Destroyed fft plan'
  
  DEALLOCATE(x,y,omeg,omegexact,stat=AllocateStatus)    
  IF (AllocateStatus .ne. 0) STOP
  PRINT *,'Deallocated CPU memory'
  
  DEALLOCATE(kx_d,ky_d,x_d,y_d,u_d,&
       v_d, omegcheck_d, omeg_d,omegoldhat_d,&
       omegexact_d, nloldhat_d,omeghat_d,nl_d, nlhat_d,&
       temp1_d,temp2_d, psihat_d,stat=AllocateStatus)   
  IF (AllocateStatus .ne. 0) STOP
  PRINT *,'Deallocated GPU memory'
  PRINT *,'Program execution complete'
END PROGRAM main

開發的檔案

[編輯 | 編輯原始碼]

ns2dcnadaptor.f90

[編輯 | 編輯原始碼]

 

 

 

 

(B)
定義navierstokescoprocessor子例程。這會呼叫來自ParaView提供的FortranAdaptorAPI.h的一些函式,以及ns2dcnVTKDataSet.cxx中使用者定義的函式。

程式碼下載

! fortran module for interacting with the ParaView CoProcessor
! loosely based on: 
! ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/PhastaAdaptor/phastaadaptor.f
! -- Changed for SESE

! subroutine determines if coprocessing needed during the
! current simulation step or not.
      
module ns2dcnadaptor
  implicit none
  public
contains
  subroutine navierstokescoprocessor(nx, ny, nz, step, time, omeg)
    ! nx, ny, nz -- grid dimensions, nz will be 1 since this is a 2D problem.
    ! step       -- current simulation time step
    ! time       -- current simulation time
    ! omega      -- scalar array for the current time step
    integer, intent(in) :: nx, ny, nz, step
    real(kind=8), intent(in) :: time
    real(kind=8), dimension(:,:), intent (in) :: omeg
    integer :: flag

    ! check if processing this time step
    ! defined in FortranAdaptorAPI.h
    call requestdatadescription(step, time, flag)
        
    if (flag .ne. 0) then
       ! processing requested
       ! check if need to create grid
       ! defined in FortranAdaptorAPI.h
       call needtocreategrid(flag)
       
       if (flag .ne. 0) then
          ! grid needed
          ! defined in ns2dcnVTKDataSet.cxx
          call createcpimagedata(nx, ny, nz)
       end if
          
       ! defined in ns2dcnVTKDataSet.cxx 
       call addfield(omeg)
       
       ! defined in FortranAdaptorAPI.h
       call coprocess()
    end if
    
    return
  end subroutine navierstokescoprocessor
end module ns2dcnadaptor

ns2dcnVTKDataSet.cxx

[編輯 | 編輯原始碼]

 

 

 

 

( C)

定義將模擬資料放入ParaView/VTK庫可以理解的格式的函式。對於完全規則的網格來說非常簡單。擴充套件到3D很簡單。非結構化網格將需要更多努力。 程式碼下載

// Adaptor for getting fortran simulation code into ParaView CoProcessor.
// Based on the PhastaAdaptor sample in the ParaView distribution.
// ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/PhastaAdaptor/PhastaAdaptor.cxx


// Fortran specific header
// ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/
#include "FortranAdaptorAPI.h" 

// CoProcessor specific headers
// ParaView-3.14.1-Source/CoProcessing/CoProcessor/
#include "vtkCPDataDescription.h"
#include "vtkCPInputDataDescription.h"
#include "vtkCPProcessor.h"

#include "vtkDoubleArray.h"
#include "vtkPointData.h"

// ParaView-3.14.1-Source/VTK/Filtering/vtkImageData.h
#include "vtkImageData.h"

// These will be called from the Fortran "glue" code"
// Completely dependent on data layout, structured vs. unstructured, etc.
// since VTK/ParaView uses different internal layouts for each.

// Creates the data container for the CoProcessor.
extern "C" void createcpimagedata_(int* nx, int* ny, int* nz)
{
  if (!ParaViewCoProcessing::GetCoProcessorData()) {
    vtkGenericWarningMacro("Unable to access CoProcessorData.");
    return;
  }

  // The simulation grid is a 2-dimensional topologically and geometrically 
  // regular grid. In VTK/ParaView, this is considered an image data set.
  vtkImageData* Grid = vtkImageData::New();
  
  // assuming dimZ == 1 for now
  Grid->SetDimensions(*nx, *ny, *nz);
  
  // Setting the Origin and Spacing are also options.

  // Name should be consistent between here, Fortran and Python client script.
  ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input")->SetGrid(Grid);
}

// Add field(s) to the data container.
// Separate from above because this will be dynamic, grid is static.
// Might be an issue, VTK probably assumes row major, but
// omeg probably passed column major...
// by hand name mangling for fortran
extern "C" void addfield_(double* scalars)
{
  vtkCPInputDataDescription *idd = ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input");

  vtkImageData* Image = vtkImageData::SafeDownCast(idd->GetGrid());

  if (!Image) {
    vtkGenericWarningMacro("No adaptor grid to attach field data to.");
    return;
  }


  // field name must match that in the fortran code.
  if (idd->IsFieldNeeded("omeg")) {
    vtkDoubleArray* omega = vtkDoubleArray::New();
    omega->SetName("omeg");
    omega->SetArray(scalars, Image->GetNumberOfPoints(), 1); 
    Image->GetPointData()->AddArray(omega);
    omega->Delete();
  }
}

在ParaView客戶端中建立視覺化管道

[編輯 | 編輯原始碼]

指令碼管道是透過使用協同處理指令碼生成器外掛在ParaView客戶端中生成的。

  • 檔案 > 開啟 > 瀏覽到資料集,該檔案現在應該在管道瀏覽器視窗中可見。
  • 在屬性視窗中,單擊應用,資料現在應該出現在佈局視窗中。
  • 要更改顏色對映,請單擊工具欄上帶有綠色圓圈的彩虹按鈕。
  • 要在佈局中顯示顏色對映,請單擊工具欄上垂直彩虹的按鈕。
  • 在工具 > 鎖定檢視大小自定義中設定視窗的大小。這將幫助你為最終影像排列專案。記下尺寸,稍後你需要它們來解決一個錯誤。
  • 如果要寫出VTK資料集,請轉到寫入器 > 並行影像資料寫入器,這應該會在管道中新增一個寫入器。
  • 要匯出python管道,請轉到協同處理 > 匯出狀態,這應該會啟動一個嚮導。
    • 單擊下一步。
    • 選擇你想要連線到你的模擬的資料集,新增它,單擊下一步。
    • 預設的模擬名稱為“輸入”。如果你更改它,請務必更新ns2dcnVTKDataSet.cxx,單擊下一步。
    • 檢查是否要輸出影像。單擊完成。
    • 儲存檔案。檔名將在navierstokes.cuf中引用,因此請儲存為類似pipeline-test-??.py的名稱,然後將其複製到pipeline.py,以避免為新的管道重新編譯。

Python 指令碼問題/錯誤

[編輯 | 編輯原始碼]

如果管道設定為輸出影像,那麼指令碼中可能存在兩個問題,可以視為錯誤。

  1. 無論在客戶端中設定了什麼視口大小,指令碼都不會使用它。
  2. 顏色對映標籤將以一個包含它們的框繪製。

第一個問題的解決方法是在客戶端中,向ViewSize方法呼叫新增與視口視窗尺寸相同的引數。第二個問題只需要簡單的編輯。這兩個問題都在示例pipeline.py中用#mvm註釋標記了。

設定環境

[編輯 | 編輯原始碼]

Forge 要求載入 ParaView OSMesa 模組。$ module load paraview/3.14.1-OSMesa。在 Nautilus 上,該模組為 paraview/3.14.1。

執行模擬

[編輯 | 編輯原始碼]

按照執行未經檢測版本的程式的方式執行。

將影像轉換為動畫

[編輯 | 編輯原始碼]

有許多方法可以做到這一點,如果 ffmpeg 可用,則可以使用 ffmpeg -sameq -i path/to/images/image%04d.png myanimation.mpg。

或者,如果將 VTK 資料集寫入檔案,則可以直接在 ParaView 中使用它們。

dt 及其對動畫時間的影響。

[編輯 | 編輯原始碼]

流暢影片約為 30 fps 或 dt = 0.033… 如果將較大 dt 寫入 VTK 資料集,則可以在 ParaView 中進行線性時間插值。如果將較大 dt 寫入影像並將其拼接成 mpg,則可用選項會大大降低影片質量。

ParaView 伺服器設定

[編輯 | 編輯原始碼]
  • Forge 沒有 X 支援,因此需要構建離屏 Mesa (OSMesa)。
  • Nautilus 支援 X,因此可以使用提供的 OpenGL 庫進行渲染。

新 API 的更改摘要

[編輯 | 編輯原始碼]

自 ParaView 3.14.1 以來,協同處理 API 有一些細微的更改。以下內容適用於 ParaView 3.98+ 和 Catalyst 1.0 alpha+。這些更改已於 2013 年 8 月 20 日在 NICS 的 Beacon 上透過 offload 模式進行了測試。

程式碼更改如下:

  • 在模擬程式碼中,coprocessorinitialize 現在為 coprocessorinitializewithpython,引數相同。
  • 在 C++ 資料集輔助程式碼中,包含 vtkCPPythonAdaptorAPI.h,而不是 FortranAdaptorAPI.h。
  • 在 C++ 資料集輔助程式碼中,從 ParaViewCoprocessing 名稱空間更改為 vtkCPPythonAdaptorAPI 名稱空間。

要連結的庫也已更改,對於 ParaView 4.0.1 版本,它們為:

  • -lvtkPVCatalystPython26D-pv4.0
  • -lvtkPVCatalyst-pv4.0
  • -lvtkCommonCore-pv4.0
  • -lvtkCommonDataModel-pv4.0
  • -lvtkPVPythonCatalyst-pv4.0

使用新 API 的示例

[編輯 | 編輯原始碼]

 

 

 

 

( A)
模擬:navierstokes.f90 -- 唯一的更改是初始化函式名稱。

程式碼下載

   
!--------------------------------------------------------------------
!
!
! PURPOSE
!
! This program numerically solves the 2D incompressible Navier-Stokes
! on a Square Domain [0,1]x[0,1] using pseudo-spectral methods and
! Crank-Nicolson timestepping. The numerical solution is compared to
! the exact Taylor-Green Vortex Solution. 
!
! Periodic free-slip boundary conditions and Initial conditions:
! u(x,y,0)=sin(2*pi*x)cos(2*pi*y)
! v(x,y,0)=-cos(2*pi*x)sin(2*pi*y)
! Analytical Solution:
! u(x,y,t)=sin(2*pi*x)cos(2*pi*y)exp(-8*pi^2*nu*t)
! v(x,y,t)=-cos(2*pi*x)sin(2*pi*y)exp(-8*pi^2*nu*t)
!
! .. Parameters ..
!  Nx    = number of modes in x - power of 2 for FFT
!  Ny    = number of modes in y - power of 2 for FFT
!  Nt    = number of timesteps to take
!  Tmax    = maximum simulation time
!  FFTW_IN_PLACE  = value for FFTW input 
!  FFTW_MEASURE  = value for FFTW input
!  FFTW_EXHAUSTIVE  = value for FFTW input
!  FFTW_PATIENT  = value for FFTW input    
!  FFTW_ESTIMATE  = value for FFTW input
!  FFTW_FORWARD     = value for FFTW input
!  FFTW_BACKWARD = value for FFTW input 
!  pi = 3.14159265358979323846264338327950288419716939937510d0
!  mu    = viscosity
!  rho    = density
! .. Scalars ..
!  i    = loop counter in x direction
!  j    = loop counter in y direction
!  n    = loop counter for timesteps direction 
!  allocatestatus = error indicator during allocation
!  count   = keep track of information written to disk
!  iol    = size of array to write to disk
!  start   = variable to record start time of program
!  finish   = variable to record end time of program
!  count_rate  = variable for clock count rate
!  planfx   = Forward 1d fft plan in x
!  planbx   = Backward 1d fft plan in x
!  planfy   = Forward 1d fft plan in y
!  planby   = Backward 1d fft plan in y
!  dt    = timestep
! .. Arrays ..
!  u    = velocity in x direction
!  uold    = velocity in x direction at previous timestep
!  v    = velocity in y direction
!  vold    = velocity in y direction at previous timestep
!  u_y    = y derivative of velocity in x direction
!  v_x    = x derivative of velocity in y direction
!  omeg    = vorticity in real space
!  omegold   = vorticity in real space at previous
!      iterate
!  omegcheck  = store of vorticity at previous iterate
!  omegoldhat  = 2D Fourier transform of vorticity at previous
!      iterate
!  omegoldhat_x  = x-derivative of vorticity in Fourier space
!      at previous iterate
!  omegold_x  = x-derivative of vorticity in real space 
!      at previous iterate
!  omegoldhat_y  = y-derivative of vorticity in Fourier space 
!      at previous iterate
!  omegold_y  = y-derivative of vorticity in real space
!       at previous iterate
!  nlold   = nonlinear term in real space
!      at previous iterate
!  nloldhat   = nonlinear term in Fourier space
!      at previous iterate
!  omeghat   = 2D Fourier transform of vorticity
!      at next iterate
!  omeghat_x  = x-derivative of vorticity in Fourier space
!      at next timestep
!  omeghat_y  = y-derivative of vorticity in Fourier space
!      at next timestep
!  omeg_x   = x-derivative of vorticity in real space
!      at next timestep
!  omeg_y   = y-derivative of vorticity in real space
!      at next timestep
! .. Vectors ..
!  kx    = fourier frequencies in x direction
!  ky    = fourier frequencies in y direction
!  kxx    = square of fourier frequencies in x direction
!  kyy    = square of fourier frequencies in y direction
!  x    = x locations
!  y    = y locations
!  time    = times at which save data
!  name_config  = array to store filename for data to be saved      
!  fftfx   = array to setup x Fourier transform
!  fftbx   = array to setup y Fourier transform
! REFERENCES
!
! ACKNOWLEDGEMENTS
!
! ACCURACY
!  
! ERROR INDICATORS AND WARNINGS
!
! FURTHER COMMENTS
! This program has not been optimized to use the least amount of memory
! but is intended as an example only for which all states can be saved
!--------------------------------------------------------------------
! External routines required
! 
! External libraries required
! FFTW3  -- Fast Fourier Transform in the West Library
!   (http://www.fftw.org/)         
! declare variables
PROGRAM main
    use nsadaptor_glue
    IMPLICIT NONE  
    INTEGER(kind=4), PARAMETER  ::  Nx=256   
    INTEGER(kind=4), PARAMETER  ::  Ny=256   
    REAL(kind=8), PARAMETER  ::  dt=0.00125     
    REAL(kind=8), PARAMETER &
        ::  pi=3.14159265358979323846264338327950288419716939937510
    REAL(kind=8), PARAMETER  ::  rho=1.0d0   
    REAL(kind=8), PARAMETER  ::  mu=1.0d0  
    REAL(kind=8), PARAMETER  :: tol=0.1d0**10  
    REAL(kind=8)    :: chg  
    INTEGER(kind=4), PARAMETER :: nplots=50
    REAL(kind=8), DIMENSION(:), ALLOCATABLE   :: time
    COMPLEX(kind=8), DIMENSION(:), ALLOCATABLE  ::  kx,kxx    
    COMPLEX(kind=8), DIMENSION(:), ALLOCATABLE  ::  ky,kyy    
    REAL(kind=8), DIMENSION(:), ALLOCATABLE   ::  x     
    REAL(kind=8), DIMENSION(:), ALLOCATABLE   ::  y      
    COMPLEX(kind=8), DIMENSION(:,:), ALLOCATABLE :: & 
        u,uold,v,vold,u_y,v_x,omegold, omegcheck, omeg,&
        omegoldhat, omegoldhat_x, omegold_x,& 
        omegoldhat_y, omegold_y, nlold, nloldhat,&
        omeghat, omeghat_x, omeghat_y, omeg_x, omeg_y,&
        nl, nlhat, psihat, psihat_x, psi_x, psihat_y, psi_y

    ! added for coprocessing
    real(kind=8), dimension(:,:), allocatable :: dump

    REAL(kind=8),DIMENSION(:,:), ALLOCATABLE :: uexact_y,vexact_x,omegexact
    INTEGER(kind=4)        ::  i,j,k,n, allocatestatus, count, iol
    INTEGER(kind=4)         :: start, finish, count_rate
    INTEGER(kind=4), PARAMETER         ::  FFTW_IN_PLACE = 8, FFTW_MEASURE = 0, &
                FFTW_EXHAUSTIVE = 8, FFTW_PATIENT = 32,    &
                                FFTW_ESTIMATE = 64
    INTEGER(kind=4),PARAMETER          ::  FFTW_FORWARD = -1, FFTW_BACKWARD=1 
    COMPLEX(kind=8), DIMENSION(:,:), ALLOCATABLE ::  fftfx,fftbx
    INTEGER(kind=8)        ::  planfxy,planbxy
    CHARACTER*100         ::  name_config
       
    CALL system_clock(start,count_rate)  
    ALLOCATE(time(1:nplots),kx(1:Nx),kxx(1:Nx),ky(1:Ny),kyy(1:Ny),x(1:Nx),y(1:Ny),&
        u(1:Nx,1:Ny),uold(1:Nx,1:Ny),v(1:Nx,1:Ny),vold(1:Nx,1:Ny),u_y(1:Nx,1:Ny),&
        v_x(1:Nx,1:Ny),omegold(1:Nx,1:Ny),omegcheck(1:Nx,1:Ny), omeg(1:Nx,1:Ny),&
        omegoldhat(1:Nx,1:Ny),omegoldhat_x(1:Nx,1:Ny), omegold_x(1:Nx,1:Ny), &
        omegoldhat_y(1:Nx,1:Ny),omegold_y(1:Nx,1:Ny), nlold(1:Nx,1:Ny), nloldhat(1:Nx,1:Ny),&
        omeghat(1:Nx,1:Ny), omeghat_x(1:Nx,1:Ny), omeghat_y(1:Nx,1:Ny), omeg_x(1:Nx,1:Ny),&
        omeg_y(1:Nx,1:Ny), nl(1:Nx,1:Ny), nlhat(1:Nx,1:Ny), psihat(1:Nx,1:Ny), &
        psihat_x(1:Nx,1:Ny), psi_x(1:Nx,1:Ny), psihat_y(1:Nx,1:Ny), psi_y(1:Nx,1:Ny),&
        uexact_y(1:Nx,1:Ny), vexact_x(1:Nx,1:Ny), omegexact(1:Nx,1:Ny),fftfx(1:Nx,1:Ny),&
        fftbx(1:Nx,1:Ny),dump(1:Nx,1:Ny),stat=AllocateStatus) 
    IF (AllocateStatus .ne. 0) STOP 
    PRINT *,'allocated space'
  
    ! set up ffts
    CALL dfftw_plan_dft_2d_(planfxy,Nx,Ny,fftfx(1:Nx,1:Ny),fftbx(1:Nx,1:Ny),&
        FFTW_FORWARD,FFTW_EXHAUSTIVE)
    CALL dfftw_plan_dft_2d_(planbxy,Nx,Ny,fftbx(1:Nx,1:Ny),fftfx(1:Nx,1:Ny),&
    FFTW_BACKWARD,FFTW_EXHAUSTIVE)
  
     ! setup fourier frequencies in x-direction
    DO i=1,1+Nx/2
        kx(i)= 2.0d0*pi*cmplx(0.0d0,1.0d0)*REAL(i-1,kind(0d0))     
    END DO
    kx(1+Nx/2)=0.0d0
    DO i = 1,Nx/2 -1
        kx(i+1+Nx/2)=-kx(1-i+Nx/2)
    END DO
    DO i=1,Nx
        kxx(i)=kx(i)*kx(i)
    END DO
    DO i=1,Nx
        x(i)=REAL(i-1,kind(0d0))/REAL(Nx,kind(0d0)) 
    END DO
  
    ! setup fourier frequencies in y-direction
    DO j=1,1+Ny/2
        ky(j)= 2.0d0*pi*cmplx(0.0d0,1.0d0)*REAL(j-1,kind(0d0))     
    END DO
    ky(1+Ny/2)=0.0d0
    DO j = 1,Ny/2 -1
        ky(j+1+Ny/2)=-ky(1-j+Ny/2)
    END DO
    DO j=1,Ny
        kyy(j)=ky(j)*ky(j)
    END DO
    DO j=1,Ny
        y(j)=REAL(j-1,kind(0d0))/REAL(Ny,kind(0d0)) 
    END DO
    PRINT *,'Setup grid and fourier frequencies'
 
  
    DO j=1,Ny
        DO i=1,Nx
            u(i,j)=sin(2.0d0*pi*x(i))*cos(2.0d0*pi*y(j))
            v(i,j)=-cos(2.0d0*pi*x(i))*sin(2.0d0*pi*y(j))
            u_y(i,j)=-2.0d0*pi*sin(2.0d0*pi*x(i))*sin(2.0d0*pi*y(j))
            v_x(i,j)=2.0d0*pi*sin(2.0d0*pi*x(i))*sin(2.0d0*pi*y(j))
            omeg(i,j)=v_x(i,j)-u_y(i,j)
        END DO
    END DO
 
    ! Vorticity to Fourier Space
    CALL dfftw_execute_dft_(planfxy,omeg(1:Nx,1:Ny),omeghat(1:Nx,1:Ny)) 
 
    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    !!!!!!!!!!!!!!!Initial nonlinear term !!!!!!!!!!!!!!!!!!!!!!!!
    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 
    ! obtain \hat{\omega}_x^{n,k}
    DO j=1,Ny
        omeghat_x(1:Nx,j)=omeghat(1:Nx,j)*kx(1:Nx)
    END DO
    ! obtain \hat{\omega}_y^{n,k}
    DO i=1,Nx
        omeghat_y(i,1:Ny)=omeghat(i,1:Ny)*ky(1:Ny)
    END DO
    ! convert to real space 
    CALL dfftw_execute_dft_(planbxy,omeghat_x(1:Nx,1:Ny),omeg_x(1:Nx,1:Ny))
    CALL dfftw_execute_dft_(planbxy,omeghat_y(1:Nx,1:Ny),omeg_y(1:Nx,1:Ny))
    ! compute nonlinear term in real space
    DO j=1,Ny
        nl(1:Nx,j)=u(1:Nx,j)*omeg_x(1:Nx,j)/REAL(Nx*Ny,kind(0d0))+&
        v(1:Nx,j)*omeg_y(1:Nx,j)/REAL(Nx*Ny,kind(0d0)) 
    END DO
    CALL dfftw_execute_dft_(planfxy,nl(1:Nx,1:Ny),nlhat(1:Nx,1:Ny)) 
    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 
    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 
    time(1)=0.0d0
    PRINT *,'Got initial data, starting timestepping'

    ! New Coprocessor API:
    call coprocessorinitializewithpython("pipeline.py", 11)
    
    DO n=1,nplots
        chg=1
        ! save old values
        uold(1:Nx,1:Ny)=u(1:Nx,1:Ny)
        vold(1:Nx,1:Ny)=v(1:Nx,1:Ny)
        omegold(1:Nx,1:Ny)=omeg(1:Nx,1:Ny)
        omegcheck(1:Nx,1:Ny)=omeg(1:Nx,1:Ny)  
        omegoldhat(1:Nx,1:Ny)=omeghat(1:Nx,1:Ny)
        nloldhat(1:Nx,1:Ny)=nlhat(1:Nx,1:Ny)
        DO WHILE (chg>tol) 
            !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
            !!!!!!!!!!!!!!nonlinear fixed (n,k+1)!!!!!!!!!!!!!!!!!!!!!!!!!
            !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 
            ! obtain \hat{\omega}_x^{n+1,k}
            DO j=1,Ny
                omeghat_x(1:Nx,j)=omeghat(1:Nx,j)*kx(1:Nx) 
            END DO
            ! obtain \hat{\omega}_y^{n+1,k}
            DO i=1,Nx
                omeghat_y(i,1:Ny)=omeghat(i,1:Ny)*ky(1:Ny)
            END DO
            ! convert back to real space 
            CALL dfftw_execute_dft_(planbxy,omeghat_x(1:Nx,1:Ny),omeg_x(1:Nx,1:Ny)) 
            CALL dfftw_execute_dft_(planbxy,omeghat_y(1:Nx,1:Ny),omeg_y(1:Nx,1:Ny))
            ! calculate nonlinear term in real space
            DO j=1,Ny
                nl(1:Nx,j)=u(1:Nx,j)*omeg_x(1:Nx,j)/REAL(Nx*Ny,kind(0d0))+&
                v(1:Nx,j)*omeg_y(1:Nx,j)/REAL(Nx*Ny,kind(0d0))
            END DO
            ! convert back to fourier
            CALL dfftw_execute_dft_(planfxy,nl(1:Nx,1:Ny),nlhat(1:Nx,1:Ny)) 
            !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 
            !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
            !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   
            ! obtain \hat{\omega}^{n+1,k+1} with Crank Nicolson timestepping
            DO j=1,Ny
                omeghat(1:Nx,j)=( (1.0d0/dt+0.5d0*(mu/rho)*(kxx(1:Nx)+kyy(j)))&
                *omegoldhat(1:Nx,j) - 0.5d0*(nloldhat(1:Nx,j)+nlhat(1:Nx,j)))/&
                (1.0d0/dt-0.5d0*(mu/rho)*(kxx(1:Nx)+kyy(j)))   
            END DO
   
            ! calculate \hat{\psi}^{n+1,k+1}
            DO j=1,Ny
                psihat(1:Nx,j)=-omeghat(1:Nx,j)/(kxx(1:Nx)+kyy(j))
            END DO
            psihat(1,1)=0.0d0
            psihat(Nx/2+1,Ny/2+1)=0.0d0
            psihat(Nx/2+1,1)=0.0d0
            psihat(1,Ny/2+1)=0.0d0
   
            ! obtain \psi_x^{n+1,k+1} and \psi_y^{n+1,k+1}
            DO j=1,Ny
                psihat_x(1:Nx,j)=psihat(1:Nx,j)*kx(1:Nx)
            END DO
            CALL dfftw_execute_dft_(planbxy,psihat_x(1:Nx,1:Ny),psi_x(1:Nx,1:Ny)) 
            DO i=1,Nx
                psihat_y(i,1:Ny)=psihat(i,1:Ny)*ky(1:Ny)
            END DO
            CALL dfftw_execute_dft_(planbxy,psihat_y(1:Ny,1:Ny),psi_y(1:Ny,1:Ny)) 
            DO j=1,Ny
                psi_x(1:Nx,j)=psi_x(1:Nx,j)/REAL(Nx*Ny,kind(0d0))
                psi_y(1:Nx,j)=psi_y(1:Nx,j)/REAL(Nx*Ny,kind(0d0))
            END DO 
   
            ! obtain \omega^{n+1,k+1}
            CALL dfftw_execute_dft_(planbxy,omeghat(1:Nx,1:Ny),omeg(1:Nx,1:Ny)) 
            DO j=1,Ny
                omeg(1:Nx,j)=omeg(1:Nx,j)/REAL(Nx*Ny,kind(0d0))
            END DO
   
            ! obtain u^{n+1,k+1} and v^{n+1,k+1} using stream function (\psi) in real space
            DO j=1,Ny
                u(1:Nx,j)=psi_y(1:Nx,j)
                v(1:Nx,j)=-psi_x(1:Nx,j)
            END DO 
   
            ! check for convergence 
            chg=maxval(abs(omeg-omegcheck)) 
            ! saves {n+1,k+1} to {n,k} for next iteration
            omegcheck=omeg  
        END DO
        time(n+1)=time(n)+dt
        PRINT *,'TIME ',time(n+1)
        ! call adaptor here
        ! not sure if there's a way to cleanly pass complex to C++ code
        do j = 1, Ny
            do i = 1, Nx
                dump(i,j) = real(omeg(i,j), kind(0d0))
            end do
        end do
        call nsadaptor(Nx, Ny, 1, n, time(n), dump)
    END DO
  
    call coprocessorfinalize()

    DO j=1,Ny
        DO i=1,Nx
            uexact_y(i,j)=-2.0d0*pi*sin(2.0d0*pi*x(i))*sin(2.0d0*pi*y(j))*&
            exp(-8.0d0*mu*(pi**2)*nplots*dt)
            vexact_x(i,j)=2.0d0*pi*sin(2.0d0*pi*x(i))*sin(2.0d0*pi*y(j))*&
            exp(-8.0d0*mu*(pi**2)*nplots*dt)
            omegexact(i,j)=vexact_x(i,j)-uexact_y(i,j)
        END DO
    END DO
  
    name_config = 'omegafinal.datbin' 
    INQUIRE(iolength=iol) omegexact(1,1)
    OPEN(unit=11,FILE=name_config,form="unformatted", access="direct",recl=iol) 
    count = 1 
    DO j=1,Ny
        DO i=1,Nx
            WRITE(11,rec=count) REAL(omeg(i,j),KIND(0d0))
            count=count+1
        END DO
    END DO
    CLOSE(11)
  
    name_config = 'omegaexactfinal.datbin' 
    OPEN(unit=11,FILE=name_config,form="unformatted", access="direct",recl=iol) 
    count = 1 
    DO j=1,Ny
        DO i=1,Nx
            WRITE(11,rec=count) omegexact(i,j)
            count=count+1
        END DO
    END DO
    CLOSE(11)

    name_config = 'xcoord.dat' 
    OPEN(unit=11,FILE=name_config,status="UNKNOWN")  
    REWIND(11)
    DO i=1,Nx
        WRITE(11,*) x(i)
    END DO
    CLOSE(11)

    name_config = 'ycoord.dat' 
    OPEN(unit=11,FILE=name_config,status="UNKNOWN")  
    REWIND(11)
    DO j=1,Ny
        WRITE(11,*) y(j)
    END DO
    CLOSE(11)
 
    CALL dfftw_destroy_plan_(planfxy)
    CALL dfftw_destroy_plan_(planbxy)
    CALL dfftw_cleanup_()
  
    DEALLOCATE(time,kx,kxx,ky,kyy,x,y,&
        u,uold,v,vold,u_y,v_x,omegold, omegcheck, omeg, &
        omegoldhat, omegoldhat_x, omegold_x,& 
        omegoldhat_y, omegold_y, nlold, nloldhat,&
        omeghat, omeghat_x, omeghat_y, omeg_x, omeg_y,&
        nl, nlhat, psihat, psihat_x, psi_x, psihat_y, psi_y,&
        uexact_y,vexact_x,omegexact, &
        fftfx,fftbx,stat=AllocateStatus) 
    IF (AllocateStatus .ne. 0) STOP 
    PRINT *,'Program execution complete'
END PROGRAM main

 

 

 

 

( A)

Fortran 介面卡:NSadaptor.f90 -- 沒有 API 更改,出於完整性考慮,將其包含在內。 程式碼下載

! Fortran module for interacting with the ParaView/Catalyst coprocessor
module nsadaptor_glue
    implicit none
    public
contains
    subroutine nsadaptor(nx, ny, nz, step, time, scalar)
        ! nx, ny, nz -- grid dimensions
        ! step       -- current time step
        ! time       -- current time
        ! scalar     -- the data
        integer, intent(in) :: nx, ny, nz, step
        real(kind=8), intent(in) :: time
        real(kind=8), dimension(:,:), intent(in) :: scalar
        integer :: flag
        
        ! check if coprocessing this step
        ! defined in adaptor header in ParaView src
        call requestdatadescription(step, time, flag)
        if (flag .ne. 0) then
            ! coprocessing requested
            ! check if grid exists
            ! also from ParaView adaptor header
            call needtocreategrid(flag)

            if (flag .ne. 0) then
                ! grid needed
                ! defined by application developer
                call createcpimagedata(nx, ny, nz)
            end if

            ! also defined by application dev
            call addfield(scalar, "test")
    
            ! from ParaView adaptor header
            call coprocess()
        end if

        return
    end subroutine nsadaptor
end module nsadaptor_glue

 

 

 

 

( A)

C++ VTK 資料輔助程式:PointBasedDataSet.cxx -- 更改包括新的標頭和新的名稱空間。還要注意 vtkSmartPointer 的使用,而不是原始指標處理。 程式碼下載

// VTK Data set creator for ParaView/Catalyst Coprocessing.
// Interfaces with a Fortran adaptor to get data from a Fortran
// simulation.

// Fortran specific header for API called from Fortran glue module
#include "vtkCPPythonAdaptorAPI.h"

// Adaptor specific headers, will need to change if simulation data structure
// changes. This should available from the ParaView install include directory
// if you chose to install development files during configuration.
#include "vtkCPDataDescription.h"
#include "vtkCPInputDataDescription.h"
#include "vtkCPProcessor.h"
#include "vtkDoubleArray.h"
#include "vtkPointData.h"
#include "vtkSmartPointer.h"
#include "vtkImageData.h"
#include "vtkCellData.h"

#include <string>
#include <iostream>
// Names are mangled with trailing underscore for linker visibility, along with
// using extern "C"

// Creates the vtkDataSet container for the CoProcessor. The simulation data
// must be compatible.
extern "C" void createcpimagedata_(int* nx, int* ny, int* nz)
{
  if (!vtkCPPythonAdaptorAPI::GetCoProcessorData()) {
    vtkGenericWarningMacro("Unable to access coprocessor data.");
    return;
  }

  vtkSmartPointer<vtkImageData> img = vtkSmartPointer<vtkImageData>::New();
	
  // Note: expects Fortran indexing
  img->SetExtent(0, *nx - 1, 0, *ny - 1, 0, *nz - 1);

  vtkCPPythonAdaptorAPI::GetCoProcessorData()->GetInputDescriptionByName("input")->SetGrid(img);
}

// Add data to the container.
extern "C" void addfield_(double* simdata, char* name)
{
  vtkSmartPointer<vtkCPInputDataDescription> idd = 
    vtkCPPythonAdaptorAPI::GetCoProcessorData()->GetInputDescriptionByName("input");
  
  vtkSmartPointer<vtkImageData> img = vtkImageData::SafeDownCast(idd->GetGrid());

  if (!img) {
    vtkGenericWarningMacro("No adaptor grid to attach field data to.");
    return;
  }

  std::string fieldPythonName = name;
	
  if (idd->IsFieldNeeded(name)) {
    vtkSmartPointer<vtkDoubleArray> field = vtkSmartPointer<vtkDoubleArray>::New();
    field->SetName(name);
    field->SetArray(simdata, img->GetNumberOfPoints(), 1);
    img->GetPointData()->AddArray(field);
  }
}

將 ParaView 協同處理與 MPI 和域分解結合使用

[編輯 | 編輯原始碼]

以下說明是在 NICS 的 Nautilus 上使用 NS3D-MPI、NavierStokes3DfftIMR.f90、KgSemiImp3D.f90 和 NSLsplitting.f90 時得出的。

節點解釋

[編輯 | 編輯原始碼]

在使用單個 CPU 節點(非分割槽資料)時,可以將模擬陣列直接傳遞給 ParaView,並使用 vtkDataSet::GetPointData()->AddArray() 方法將其新增到管道中。然後,vtkImageData 的點將與模擬的網格節點完全對應。該 vtkImageData 將具有隱式細胞結構,每個細胞角點都是模擬節點之一。

如果沒有針對 MPI 的特殊更改,此程式碼將執行,但是 ParaView 將無法建立角點位於不同秩上的細胞。載入 .pvti 檔案時會顯示錯誤,指出範圍不正確。可以載入各個 .vtis 檔案,檢視實際情況確實是這樣的,以及細胞之間是否存在間隙。

例如,假設陣列是 dimension(20,10,10),並且已分解為 2 個鉛筆 (1:10,:,:) 和 (11:20,:,:)。ParaView 認為 (10:11,:,:) 的資料丟失,因此會出現間隙。

使用點光暈

[編輯 | 編輯原始碼]

光暈提供了一種填補間隙的方法。2decomp&fft 光暈 api 建立包含鉛筆資料和周圍光暈元素的陣列。例如,假設鉛筆為 (10:20,10:20,:), 則光暈陣列為 (9:21,9:21,:). 使用光暈會帶來一些問題:

  • 光暈元素可能會超出計算域,這些元素將包含垃圾資料。
  • 並非每個鉛筆都需要將相同的光暈元素髮送給協同處理器。這需要在程式中新增額外的決策邏輯。
  • 從 C++ 的角度來看,光暈陣列部分不會緊密打包,因此會包含垃圾資料。

模擬程式碼中的程式碼更改

[編輯 | 編輯原始碼]

 

 

 

 

( D)
  • 添加了光暈陣列。
  • 添加了用於將打包陣列傳遞給 C++ 的一維陣列。
  • 根據鉛筆範圍添加了用於判斷要傳遞哪些光暈元素的分支。
    1. 如果鉛筆距離索引原點最遠,則僅傳送鉛筆,不傳送光暈元素。
    2. 否則,如果鉛筆位於最遠範圍的任一側,但不位於兩側,則傳送其他範圍中最遠的光暈元素。
    3. 否則,傳送兩側最遠的光暈元素。
  • 添加了對光暈區域的重塑。陣列透過引用從 F90 傳遞到 C++。C++ 不理解 F90 陣列區域,並且只會從引用中遞增讀取(這是任何陣列區域問題,不僅僅是光暈)。解決此問題的一種方法是將區域重塑為一維陣列,這將使所需資料在記憶體中連續。

程式碼下載

PROGRAM main    
  !-----------------------------------------------------------------------------------
  !
  !
  ! PURPOSE
  !
  ! This program numerically solves the 3D incompressible Navier-Stokes
  ! on a Cubic Domain [0,2pi]x[0,2pi]x[0,2pi] using pseudo-spectral methods and
  ! Implicit Midpoint rule timestepping. The numerical solution is compared to
  ! an exact solution reported by Shapiro 
  !
  ! Analytical Solution:
  !       u(x,y,z,t)=-0.25*(cos(x)sin(y)sin(z)+sin(x)cos(y)cos(z))exp(-t/Re)
  !       v(x,y,z,t)= 0.25*(sin(x)cos(y)sin(z)-cos(x)sin(y)cos(z))exp(-t/Re)
  !       w(x,y,z,t)= 0.5*cos(x)cos(y)sin(z)exp(-t/Re)
  !
  ! .. Parameters ..
  !  Nx                           = number of modes in x - power of 2 for FFT
  !  Ny                           = number of modes in y - power of 2 for FFT
  !  Nz                           = number of modes in z - power of 2 for FFT
  !  Nt                           = number of timesteps to take
  !  Tmax                         = maximum simulation time
  !  FFTW_IN_PLACE        = value for FFTW input 
  !  FFTW_MEASURE         = value for FFTW input
  !  FFTW_EXHAUSTIVE      = value for FFTW input
  !  FFTW_PATIENT         = value for FFTW input    
  !  FFTW_ESTIMATE        = value for FFTW input
  !  FFTW_FORWARD     = value for FFTW input
  !  FFTW_BACKWARD        = value for FFTW input  
  !  pi = 3.14159265358979323846264338327950288419716939937510d0
  !  Re                           = Reynolds number
  ! .. Scalars ..
  !  i                            = loop counter in x direction
  !  j                            = loop counter in y direction
  !  k                            = loop counter in z direction
  !  n                            = loop counter for timesteps direction  
  !  allocatestatus       = error indicator during allocation
  !  count                        = keep track of information written to disk
  !  iol                          = size of array to write to disk
  !  start                        = variable to record start time of program
  !  finish                       = variable to record end time of program
  !  count_rate           = variable for clock count rate
  !  planfxyz                     = Forward 3d fft plan 
  !  planbxyz                     = Backward 3d fft plan
  !  dt                           = timestep
  ! .. Arrays ..
  !  u                            = velocity in x direction
  !  v                            = velocity in y direction
  !  w                            = velocity in z direction
  !  uold                         = velocity in x direction at previous timestep
  !  vold                         = velocity in y direction at previous timestep
  !  wold                         = velocity in z direction at previous timestep
  !  ux                           = x derivative of velocity in x direction
  !  uy                           = y derivative of velocity in x direction
  !  uz                           = z derivative of velocity in x direction
  !  vx                           = x derivative of velocity in y direction
  !  vy                           = y derivative of velocity in y direction
  !  vz                           = z derivative of velocity in y direction
  !  wx                           = x derivative of velocity in z direction
  !  wy                           = y derivative of velocity in z direction
  !  wz                           = z derivative of velocity in z direction
  !  uxold                        = x derivative of velocity in x direction
  !  uyold                        = y derivative of velocity in x direction
  !  uzold                        = z derivative of velocity in x direction
  !  vxold                        = x derivative of velocity in y direction
  !  vyold                        = y derivative of velocity in y direction
  !  vzold                        = z derivative of velocity in y direction
  !  wxold                        = x derivative of velocity in z direction
  !  wyold                        = y derivative of velocity in z direction
  !  wzold                        = z derivative of velocity in z direction
  !  utemp                        = temporary storage of u to check convergence
  !  vtemp                        = temporary storage of u to check convergence
  !  wtemp                        = temporary storage of u to check convergence
  !  temp_r                       = temporary storage for untransformed variables
  !  uhat                         = Fourier transform of u
  !  vhat                         = Fourier transform of v
  !  what                         = Fourier transform of w
  !  rhsuhatfix           = Fourier transform of righthand side for u for timestepping
  !  rhsvhatfix           = Fourier transform of righthand side for v for timestepping
  !  rhswhatfix           = Fourier transform of righthand side for w for timestepping
  !  nonlinuhat           = Fourier transform of nonlinear term for u
  !  nonlinvhat           = Fourier transform of nonlinear term for u
  !  nonlinwhat           = Fourier transform of nonlinear term for u
  !  phat                         = Fourier transform of nonlinear term for pressure, p
  !  temp_c                       = temporary storage for Fourier transforms
  !  realtemp                     = Real storage
  !
  ! .. Vectors ..
  !  kx                           = fourier frequencies in x direction
  !  ky                           = fourier frequencies in y direction
  !  kz                           = fourier frequencies in z direction
  !  x                            = x locations
  !  y                            = y locations
  !  z                            = y locations
  !  time                         = times at which save data
  !  name_config          = array to store filename for data to be saved
  !               
  ! REFERENCES
  !
  ! A. Shapiro " The use of an exact solution of the Navier-Stokes equations 
  ! in a validation test of a three-dimensional nonhydrostatic numerical model"
  ! Monthly Weather Review vol. 121, 2420-2425, (1993).
  !
  ! ACKNOWLEDGEMENTS
  !
  ! ACCURACY
  !               
  ! ERROR INDICATORS AND WARNINGS
  !
  ! FURTHER COMMENTS
  !
  ! This program has not been optimized to use the least amount of memory
  ! but is intended as an example only for which all states can be saved
  !
  !--------------------------------------------------------------------------------
  ! External routines required
  ! 
  ! External libraries required
  ! 2DECOMP&FFT -- Fast Fourier Transform in the West Library
  !                       (http://2decomp.org/)


  !---------------------------------------------------------------------------------

  USE decomp_2d
  USE decomp_2d_fft
  USE decomp_2d_io
  USE MPI
  !coprocessing:
  use NSadaptor_module 
  IMPLICIT NONE   
  ! declare variables
  INTEGER(kind=4), PARAMETER              :: Nx=256
  INTEGER(kind=4), PARAMETER              :: Ny=256
  INTEGER(kind=4), PARAMETER              :: Nz=256
  INTEGER(kind=4), PARAMETER              :: Lx=1
  INTEGER(kind=4), PARAMETER              :: Ly=1
  INTEGER(kind=4), PARAMETER              :: Lz=1
  INTEGER(kind=4), PARAMETER              :: Nt=20
  REAL(kind=8), PARAMETER                 :: dt=0.05d0/Nt
  REAL(kind=8), PARAMETER                 :: Re=1.0d0     
  REAL(kind=8), PARAMETER                 :: tol=0.1d0**10
  REAL(kind=8), PARAMETER                 :: theta=0.0d0

  REAL(kind=8), PARAMETER &
       ::  pi=3.14159265358979323846264338327950288419716939937510d0
  REAL(kind=8), PARAMETER         ::      ReInv=1.0d0/REAL(Re,kind(0d0))
  REAL(kind=8), PARAMETER         ::  dtInv=1.0d0/REAL(dt,kind(0d0)) 
  REAL(kind=8)                            :: scalemodes,chg,factor
  REAL(kind=8), DIMENSION(:), ALLOCATABLE                 :: x, y, z, time,mychg,allchg
  COMPLEX(kind=8), DIMENSION(:,:,:), ALLOCATABLE  :: u, v, w,&
       ux, uy, uz,&
       vx, vy, vz,&
       wx, wy, wz,&
       uold, uxold, uyold, uzold,&
       vold, vxold, vyold, vzold,&
       wold, wxold, wyold, wzold,&
       utemp, vtemp, wtemp, temp_r

  COMPLEX(kind=8), DIMENSION(:), ALLOCATABLE      ::      kx, ky, kz                                              
  COMPLEX(kind=8), DIMENSION(:,:,:), ALLOCATABLE  ::      uhat, vhat, what,&
       rhsuhatfix, rhsvhatfix,&
       rhswhatfix, nonlinuhat,&
       nonlinvhat, nonlinwhat,&
       phat,temp_c
  !coprocessing: added x,y,z component arrays also halos
  !coprocessing: halos get allocated inside call to update_halo
  REAL(kind=8), DIMENSION(:,:,:), ALLOCATABLE     ::  realtemp, realtempx, &
       realtempy, realtempz, tempxhalo, tempyhalo, tempzhalo
  !coprocessing: added 1D array for passing to C++
  real(kind=8), dimension(:), allocatable :: xpass2c, ypass2c, zpass2c
  ! MPI and 2DECOMP variables
  TYPE(DECOMP_INFO)                               ::  decomp
  INTEGER(kind=MPI_OFFSET_KIND)                   ::  filesize, disp
  INTEGER(kind=4)                                 ::  p_row=0, p_col=0, numprocs, myid, ierr      

  ! variables used for saving data and timing
  INTEGER(kind=4)                                 :: count, iol 
  INTEGER(kind=4)                                 :: i,j,k,n,t,allocatestatus
  INTEGER(kind=4)                                 :: ind, numberfile
  CHARACTER*100                                   :: name_config
  INTEGER(kind=4)                                 :: start, finish, count_rate 

  ! initialisation of 2DECOMP&FFT
  CALL MPI_INIT(ierr)
  CALL MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)
  CALL MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr) 
  ! do automatic domain decomposition
  CALL decomp_2d_init(Nx,Ny,Nz,p_row,p_col)
  ! get information about domain decomposition choosen
  CALL decomp_info_init(Nx,Ny,Nz,decomp)
  ! initialise FFT library
  CALL decomp_2d_fft_init
  IF (myid.eq.0) THEN
     PRINT *,'Grid:',Nx,'X',Ny,'Y',Nz,'Z'
     PRINT *,'dt:',dt
  END IF

  ALLOCATE(x(1:Nx),y(1:Ny),z(1:Nz),time(1:Nt+1),mychg(1:3),allchg(1:3),&
       u(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),& 
       v(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       w(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       ux(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       uy(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       uz(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       vx(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       vy(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       vz(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       wx(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       wy(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       wz(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       uold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       uxold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       uyold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       uzold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       vold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       vxold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       vyold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       vzold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       wold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       wxold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       wyold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       wzold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       utemp(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       vtemp(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       wtemp(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       temp_r(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       kx(1:Nx),ky(1:Ny),kz(1:Nz),&
       uhat(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       vhat(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       what(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       rhsuhatfix(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       rhsvhatfix(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       rhswhatfix(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       nonlinuhat(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       nonlinvhat(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       nonlinwhat(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       phat(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       temp_c(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       realtemp(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),& 
       realtempx(decomp%xst(1):decomp%xen(1), &
       decomp%xst(2):decomp%xen(2), &
       decomp%xst(3):decomp%xen(3)),&
       realtempy(decomp%xst(1):decomp%xen(1), &
       decomp%xst(2):decomp%xen(2), &
       decomp%xst(3):decomp%xen(3)),&
       realtempz(decomp%xst(1):decomp%xen(1), &
       decomp%xst(2):decomp%xen(2), &
       decomp%xst(3):decomp%xen(3)),&
       stat=AllocateStatus)      
  IF (AllocateStatus .ne. 0) STOP
  IF (myid.eq.0) THEN
     PRINT *,'allocated space'
  END IF

  ! setup fourier frequencies in x-direction
  DO i=1,Nx/2+1
     kx(i)= cmplx(0.0d0,1.0d0)*REAL(i-1,kind(0d0))/Lx                        
  END DO
  kx(1+Nx/2)=0.0d0
  DO i = 1,Nx/2 -1
     kx(i+1+Nx/2)=-kx(1-i+Nx/2)
  END DO
  ind=1
  DO i=-Nx/2,Nx/2-1
     x(ind)=2.0d0*pi*REAL(i,kind(0d0))*Lx/REAL(Nx,kind(0d0))
     ind=ind+1
  END DO
  ! setup fourier frequencies in y-direction
  DO j=1,Ny/2+1
     ky(j)= cmplx(0.0d0,1.0d0)*REAL(j-1,kind(0d0))/Ly                        
  END DO
  ky(1+Ny/2)=0.0d0
  DO j = 1,Ny/2 -1
     ky(j+1+Ny/2)=-ky(1-j+Ny/2)
  END DO
  ind=1
  DO j=-Ny/2,Ny/2-1
     y(ind)=2.0d0*pi*REAL(j,kind(0d0))*Ly/REAL(Ny,kind(0d0))
     ind=ind+1
  END DO
  ! setup fourier frequencies in z-direction
  DO k=1,Nz/2+1
     kz(k)= cmplx(0.0d0,1.0d0)*REAL(k-1,kind(0d0))/Lz                        
  END DO
  kz(1+Nz/2)=0.0d0
  DO k = 1,Nz/2 -1
     kz(k+1+Nz/2)=-kz(1-k+Nz/2)
  END DO
  ind=1
  DO k=-Nz/2,Nz/2-1
     z(ind)=2.0d0*pi*REAL(k,kind(0d0))*Lz/REAL(Nz,kind(0d0))
     ind=ind+1
  END DO
  scalemodes=1.0d0/REAL(Nx*Ny*Nz,kind(0d0))
  IF (myid.eq.0) THEN
     PRINT *,'Setup grid and fourier frequencies'
  END IF

  !initial conditions for Taylor-Green vortex
  !       factor=2.0d0/sqrt(3.0d0)
  !       DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
  !               u(i,j,k)=factor*sin(theta+2.0d0*pi/3.0d0)*sin(x(i))*cos(y(j))*cos(z(k))
  !       END DO; END DO; END DO
  !       DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
  !               v(i,j,k)=factor*sin(theta-2.0d0*pi/3.0d0)*cos(x(i))*sin(y(j))*cos(z(k))
  !       END DO ; END DO ; END DO
  !       DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
  !               w(i,j,k)=factor*sin(theta)*cos(x(i))*cos(y(j))*sin(z(k))
  !       END DO ; END DO ; END DO

  time(1)=0.0d0
  factor=sqrt(3.0d0)
  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     u(i,j,k)=-0.5*( factor*cos(x(i))*sin(y(j))*sin(z(k))&
          +sin(x(i))*cos(y(j))*cos(z(k)) )*exp(-(factor**2)*time(1)/Re)
  END DO; END DO; END DO
  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     v(i,j,k)=0.5*(  factor*sin(x(i))*cos(y(j))*sin(z(k))&
          -cos(x(i))*sin(y(j))*cos(z(k)) )*exp(-(factor**2)*time(1)/Re)
  END DO; END DO ; END DO
  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     w(i,j,k)=cos(x(i))*cos(y(j))*sin(z(k))*exp(-(factor**2)*time(1)/Re)
  END DO; END DO ; END DO

  CALL decomp_2d_fft_3d(u,uhat,DECOMP_2D_FFT_FORWARD)
  CALL decomp_2d_fft_3d(v,vhat,DECOMP_2D_FFT_FORWARD)
  CALL decomp_2d_fft_3d(w,what,DECOMP_2D_FFT_FORWARD)


  ! derivative of u with respect to x, y, and z 
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=uhat(i,j,k)*kx(i)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,ux,DECOMP_2D_FFT_BACKWARD) 
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=uhat(i,j,k)*ky(j)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,uy,DECOMP_2D_FFT_BACKWARD) 
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=uhat(i,j,k)*kz(k)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,uz,DECOMP_2D_FFT_BACKWARD) 

  ! derivative of v with respect to x, y, and z 
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=vhat(i,j,k)*kx(i)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,vx,DECOMP_2D_FFT_BACKWARD)         
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=vhat(i,j,k)*ky(j)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,vy,DECOMP_2D_FFT_BACKWARD) 
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=vhat(i,j,k)*kz(k)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,vz,DECOMP_2D_FFT_BACKWARD)         

  ! derivative of w with respect to x, y, and z 
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=what(i,j,k)*kx(i)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,wx,DECOMP_2D_FFT_BACKWARD)         
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=what(i,j,k)*ky(j)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,wy,DECOMP_2D_FFT_BACKWARD)         
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=what(i,j,k)*kz(k)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,wz,DECOMP_2D_FFT_BACKWARD)         

  ! save initial data
  n = 0
  !coprocessing: removed savedata calls from coprocessing version
  !coprocessing: could have this after the coprocessor initialize and
  !coprocessing: followed by a call to the coprocessor to write out the initial data
  !coprocessing: image. Otherwise, these are just wasted calls at this spot.
  call coprocessorinitialize("pipeline.py", 11)

  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     realtempx(i,j,k)=REAL(wy(i,j,k)-vz(i,j,k),KIND=8)
  END DO; END DO ; END DO
  call update_halo(realtempx, tempxhalo, level=1)

  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     realtempy(i,j,k)=REAL(uz(i,j,k)-wx(i,j,k),KIND=8)
  END DO; END DO ; END DO
  call update_halo(realtempy, tempyhalo, level=1)

  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     realtempz(i,j,k)=REAL(vx(i,j,k)-uy(i,j,k),KIND=8)
  END DO; END DO ; END DO
  call update_halo(realtempz, tempzhalo, level=1)

  !coprocessing: Four cases of sending halo data to the coprocessor.
  if ((ny_global == decomp%xen(2)) .and. (nz_global == decomp%xen(3))) then
     !coprocessing: case 1: pencil furthest from index origin, send just the pencil
     allocate(xpass2c(decomp%xsz(1)*decomp%xsz(2)*decomp%xsz(3)))
     allocate(ypass2c(decomp%xsz(1)*decomp%xsz(2)*decomp%xsz(3)))
     allocate(zpass2c(decomp%xsz(1)*decomp%xsz(2)*decomp%xsz(3)))
     xpass2c = reshape(tempxhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)),(/size(xpass2c)/))
     ypass2c = reshape(tempyhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)),(/size(ypass2c)/))     
     zpass2c = reshape(tempzhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)),(/size(zpass2c)/))
     call NSadaptor(Nx,Ny,Nz,decomp%xst(1),decomp%xen(1), &
          decomp%xst(2),decomp%xen(2),decomp%xst(3),decomp%xen(3), n, n*dt, &
          xpass2c, ypass2c, zpass2c)
     deallocate(xpass2c, ypass2c, zpass2c)
  else if ((ny_global > decomp%xen(2)) .and. (nz_global == decomp%xen(3))) then
     !coprocessing: case 2: pencil furthest in z, send just upper y halo
     allocate(xpass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*decomp%xsz(3)))
     allocate(ypass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*decomp%xsz(3)))
     allocate(zpass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*decomp%xsz(3)))
     xpass2c = reshape(tempxhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)),(/size(xpass2c)/))
     ypass2c = reshape(tempyhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)),(/size(ypass2c)/))
     zpass2c = reshape(tempzhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)),(/size(zpass2c)/))
     call NSadaptor(Nx,Ny,Nz,decomp%xst(1),decomp%xen(1), &
          decomp%xst(2),decomp%xen(2)+1,decomp%xst(3),decomp%xen(3), n, n*dt, &
          xpass2c, ypass2c, zpass2c)
     deallocate(xpass2c, ypass2c, zpass2c)
  else if ((ny_global == decomp%xen(2)) .and. (nz_global > decomp%xen(3))) then
     !coprocessing: case 3: pencil furthest in y, send just upper z halo
     allocate(xpass2c(decomp%xsz(1)*decomp%xsz(2)*(decomp%xsz(3)+1)))
     allocate(ypass2c(decomp%xsz(1)*decomp%xsz(2)*(decomp%xsz(3)+1)))
     allocate(zpass2c(decomp%xsz(1)*decomp%xsz(2)*(decomp%xsz(3)+1)))
     xpass2c = reshape(tempxhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)+1),(/size(xpass2c)/))
     ypass2c = reshape(tempyhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)+1),(/size(ypass2c)/))
     zpass2c = reshape(tempzhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)+1),(/size(zpass2c)/))
     call NSadaptor(Nx,Ny,Nz,decomp%xst(1),decomp%xen(1), &
          decomp%xst(2),decomp%xen(2),decomp%xst(3),decomp%xen(3)+1, n, n*dt, &
          xpass2c, ypass2c, zpass2c)
     deallocate(xpass2c, ypass2c, zpass2c)
  else
     !coprocessing: case 4: send both upper y and upper z halo
     allocate(xpass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*(decomp%xsz(3)+1)))
     allocate(ypass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*(decomp%xsz(3)+1)))
     allocate(zpass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*(decomp%xsz(3)+1)))
     xpass2c = reshape(tempxhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)+1),(/size(xpass2c)/))
     ypass2c = reshape(tempyhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)+1),(/size(ypass2c)/))
     zpass2c = reshape(tempzhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)+1),(/size(zpass2c)/))
     call NSadaptor(Nx,Ny,Nz,decomp%xst(1),decomp%xen(1), &
          decomp%xst(2),decomp%xen(2)+1,decomp%xst(3),decomp%xen(3)+1,n,n*dt, &
          xpass2c, ypass2c, zpass2c)
     deallocate(xpass2c,ypass2c,zpass2c)
  end if

  !start timer
  CALL system_clock(start,count_rate)
  ! coprocessing: Simulation loop starts here
  DO n=1,Nt
     !fixed point
     DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
        uold(i,j,k)=u(i,j,k)
        uxold(i,j,k)=ux(i,j,k)
        uyold(i,j,k)=uy(i,j,k)
        uzold(i,j,k)=uz(i,j,k)
     END DO; END DO ; END DO
     DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
        vold(i,j,k)=v(i,j,k)
        vxold(i,j,k)=vx(i,j,k)
        vyold(i,j,k)=vy(i,j,k)
        vzold(i,j,k)=vz(i,j,k)
     END DO; END DO ; END DO
     DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
        wold(i,j,k)=w(i,j,k)
        wxold(i,j,k)=wx(i,j,k)
        wyold(i,j,k)=wy(i,j,k)
        wzold(i,j,k)=wz(i,j,k)
     END DO; END DO ; END DO
     DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
        rhsuhatfix(i,j,k) = (dtInv+(0.5*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k)))*uhat(i,j,k) 
     END DO; END DO ; END DO
     DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
        rhsvhatfix(i,j,k) = (dtInv+(0.5*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k)))*vhat(i,j,k) 
     END DO; END DO ; END DO
     DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
        rhswhatfix(i,j,k) = (dtInv+(0.5*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k)))*what(i,j,k) 
     END DO; END DO ; END DO

     chg=1
     DO WHILE (chg .gt. tol)
        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           temp_r(i,j,k)=0.25d0*((u(i,j,k)+uold(i,j,k))*(ux(i,j,k)+uxold(i,j,k))&
                +(v(i,j,k)+vold(i,j,k))*(uy(i,j,k)+uyold(i,j,k))&
                +(w(i,j,k)+wold(i,j,k))*(uz(i,j,k)+uzold(i,j,k)))
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_r,nonlinuhat,DECOMP_2D_FFT_FORWARD)
        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           temp_r(i,j,k)=0.25d0*((u(i,j,k)+uold(i,j,k))*(vx(i,j,k)+vxold(i,j,k))&
                +(v(i,j,k)+vold(i,j,k))*(vy(i,j,k)+vyold(i,j,k))&
                +(w(i,j,k)+wold(i,j,k))*(vz(i,j,k)+vzold(i,j,k)))
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_r,nonlinvhat,DECOMP_2D_FFT_FORWARD)
        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           temp_r(i,j,k)=0.25d0*((u(i,j,k)+uold(i,j,k))*(wx(i,j,k)+wxold(i,j,k))&
                +(v(i,j,k)+vold(i,j,k))*(wy(i,j,k)+wyold(i,j,k))&
                +(w(i,j,k)+wold(i,j,k))*(wz(i,j,k)+wzold(i,j,k)))
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_r,nonlinwhat,DECOMP_2D_FFT_FORWARD)
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           phat(i,j,k)=-1.0d0*( kx(i)*nonlinuhat(i,j,k)&
                +ky(j)*nonlinvhat(i,j,k)&
                +kz(k)*nonlinwhat(i,j,k))&
                /(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k)+0.1d0**13)
        END DO; END DO ; END DO

        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           uhat(i,j,k)=(rhsuhatfix(i,j,k)-nonlinuhat(i,j,k)-kx(i)*phat(i,j,k))/&
                (dtInv-(0.5d0*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k))) !*scalemodes
        END DO; END DO ; END DO
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           vhat(i,j,k)=(rhsvhatfix(i,j,k)-nonlinvhat(i,j,k)-ky(j)*phat(i,j,k))/&
                (dtInv-(0.5d0*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k))) !*scalemodes
        END DO; END DO ; END DO
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           what(i,j,k)=(rhswhatfix(i,j,k)-nonlinwhat(i,j,k)-kz(k)*phat(i,j,k))/&
                (dtInv-(0.5d0*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k))) !*scalemodes
        END DO; END DO ; END DO

        ! derivative of u with respect to x, y, and z 
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=uhat(i,j,k)*kx(i)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,ux,DECOMP_2D_FFT_BACKWARD) 
        DO k=decomp%zst(3),decomp%zen(3); DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=uhat(i,j,k)*ky(j)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,uy,DECOMP_2D_FFT_BACKWARD) 
        DO k=decomp%zst(3),decomp%zen(3); DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=uhat(i,j,k)*kz(k)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,uz,DECOMP_2D_FFT_BACKWARD) 

        ! derivative of v with respect to x, y, and z 
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=vhat(i,j,k)*kx(i)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,vx,DECOMP_2D_FFT_BACKWARD) 
        DO k=decomp%zst(3),decomp%zen(3); DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=vhat(i,j,k)*ky(j)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,vy,DECOMP_2D_FFT_BACKWARD) 
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=vhat(i,j,k)*kz(k)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,vz,DECOMP_2D_FFT_BACKWARD) 

        ! derivative of w with respect to x, y, and z 
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=what(i,j,k)*kx(i)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,wx,DECOMP_2D_FFT_BACKWARD) 
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=what(i,j,k)*ky(j)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,wy,DECOMP_2D_FFT_BACKWARD) 
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=what(i,j,k)*kz(k)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,wz,DECOMP_2D_FFT_BACKWARD) 

        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           utemp(i,j,k)=u(i,j,k)
        END DO; END DO ; END DO
        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           vtemp(i,j,k)=v(i,j,k)
        END DO; END DO ; END DO
        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           wtemp(i,j,k)=w(i,j,k)
        END DO; END DO ; END DO

        CALL decomp_2d_fft_3d(uhat,u,DECOMP_2D_FFT_BACKWARD)    
        CALL decomp_2d_fft_3d(vhat,v,DECOMP_2D_FFT_BACKWARD)    
        CALL decomp_2d_fft_3d(what,w,DECOMP_2D_FFT_BACKWARD)    

        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           u(i,j,k)=u(i,j,k)*scalemodes
        END DO; END DO ; END DO
        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           v(i,j,k)=v(i,j,k)*scalemodes
        END DO; END DO ; END DO
        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           w(i,j,k)=w(i,j,k)*scalemodes
        END DO; END DO ; END DO

        mychg(1) =maxval(abs(utemp-u))
        mychg(2) =maxval(abs(vtemp-v))
        mychg(3) =maxval(abs(wtemp-w))
        CALL MPI_ALLREDUCE(mychg,allchg,3,MPI_DOUBLE_PRECISION,MPI_MAX,MPI_COMM_WORLD,ierr)
        chg=allchg(1)+allchg(2)+allchg(3)
        IF (myid.eq.0) THEN
           PRINT *,'chg:',chg
        END IF
     END DO
     time(n+1)=n*dt

     !goto 5100
     IF (myid.eq.0) THEN     
        PRINT *,'time',n*dt
     END IF

     !coprocessing: populating the arrays sent to the coprocessor
     DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
        realtempx(i,j,k)=REAL(wy(i,j,k)-vz(i,j,k),KIND=8)
     END DO; END DO ; END DO
     call update_halo(realtempx, tempxhalo, level=1)

     DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
        realtempy(i,j,k)=REAL(uz(i,j,k)-wx(i,j,k),KIND=8)
     END DO; END DO ; END DO
     call update_halo(realtempy, tempyhalo, level=1)

     DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
        realtempz(i,j,k)=REAL(vx(i,j,k)-uy(i,j,k),KIND=8)
     END DO; END DO ; END DO
     call update_halo(realtempz, tempzhalo, level=1)

     !coprocessing: same cases as for initial conditions. 
     if ((ny_global == decomp%xen(2)) .and. (nz_global == decomp%xen(3))) then
        ! case 1: send just the pencil
        allocate(xpass2c(decomp%xsz(1)*decomp%xsz(2)*decomp%xsz(3)))
        allocate(ypass2c(decomp%xsz(1)*decomp%xsz(2)*decomp%xsz(3)))
        allocate(zpass2c(decomp%xsz(1)*decomp%xsz(2)*decomp%xsz(3)))
        xpass2c = reshape(tempxhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)),(/size(xpass2c)/))
        ypass2c = reshape(tempyhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)),(/size(ypass2c)/))     
        zpass2c = reshape(tempzhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)),(/size(zpass2c)/))
        call NSadaptor(Nx,Ny,Nz,decomp%xst(1),decomp%xen(1), &
             decomp%xst(2),decomp%xen(2),decomp%xst(3),decomp%xen(3), n, n*dt, &
             xpass2c, ypass2c, zpass2c)
        deallocate(xpass2c, ypass2c, zpass2c)
     else if ((ny_global > decomp%xen(2)) .and. (nz_global == decomp%xen(3))) then
        ! case 2: send just upper y halo
        allocate(xpass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*decomp%xsz(3)))
        allocate(ypass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*decomp%xsz(3)))
        allocate(zpass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*decomp%xsz(3)))
        xpass2c = reshape(tempxhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)),(/size(xpass2c)/))
        ypass2c = reshape(tempyhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)),(/size(ypass2c)/))
        zpass2c = reshape(tempzhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)),(/size(zpass2c)/))
        call NSadaptor(Nx,Ny,Nz,decomp%xst(1),decomp%xen(1), &
             decomp%xst(2),decomp%xen(2)+1,decomp%xst(3),decomp%xen(3), n, n*dt, &
             xpass2c, ypass2c, zpass2c)
        deallocate(xpass2c, ypass2c, zpass2c)
     else if ((ny_global == decomp%xen(2)) .and. (nz_global > decomp%xen(3))) then
        ! case 3: send just upper z halo
        allocate(xpass2c(decomp%xsz(1)*decomp%xsz(2)*(decomp%xsz(3)+1)))
        allocate(ypass2c(decomp%xsz(1)*decomp%xsz(2)*(decomp%xsz(3)+1)))
        allocate(zpass2c(decomp%xsz(1)*decomp%xsz(2)*(decomp%xsz(3)+1)))
        xpass2c = reshape(tempxhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)+1),(/size(xpass2c)/))
        ypass2c = reshape(tempyhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)+1),(/size(ypass2c)/))
        zpass2c = reshape(tempzhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)+1),(/size(zpass2c)/))
        call NSadaptor(Nx,Ny,Nz,decomp%xst(1),decomp%xen(1), &
             decomp%xst(2),decomp%xen(2),decomp%xst(3),decomp%xen(3)+1, n, n*dt, &
             xpass2c, ypass2c, zpass2c)
        deallocate(xpass2c, ypass2c, zpass2c)
     else
        ! send both upper y and upper z halo
        allocate(xpass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*(decomp%xsz(3)+1)))
        allocate(ypass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*(decomp%xsz(3)+1)))
        allocate(zpass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*(decomp%xsz(3)+1)))
        xpass2c = reshape(tempxhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)+1),(/size(xpass2c)/))
        ypass2c = reshape(tempyhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)+1),(/size(ypass2c)/))
        zpass2c = reshape(tempzhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)+1),(/size(zpass2c)/))
        call NSadaptor(Nx,Ny,Nz,decomp%xst(1),decomp%xen(1), &
             decomp%xst(2),decomp%xen(2)+1,decomp%xst(3),decomp%xen(3)+1,n,n*dt, &
             xpass2c, ypass2c, zpass2c)
        deallocate(xpass2c,ypass2c,zpass2c)
     end if

  END DO
  !coprocessing:
  call coprocessorfinalize()

  CALL system_clock(finish,count_rate)

  IF (myid.eq.0) then
     PRINT *, 'Program took', REAL(finish-start)/REAL(count_rate), 'for main timestepping loop'
  END IF

  IF (myid.eq.0) THEN
     name_config = './data/tdata.dat' 
     OPEN(unit=11,FILE=name_config,status="UNKNOWN")         
     REWIND(11)
     DO n=1,1+Nt
        WRITE(11,*) time(n)
     END DO 
     CLOSE(11)

     name_config = './data/xcoord.dat' 
     OPEN(unit=11,FILE=name_config,status="UNKNOWN")         
     REWIND(11)
     DO i=1,Nx
        WRITE(11,*) x(i)
     END DO
     CLOSE(11)       

     name_config = './data/ycoord.dat' 
     OPEN(unit=11,FILE=name_config,status="UNKNOWN")         
     REWIND(11)
     DO j=1,Ny
        WRITE(11,*) y(j)
     END DO
     CLOSE(11)

     name_config = './data/zcoord.dat' 
     OPEN(unit=11,FILE=name_config,status="UNKNOWN")         
     REWIND(11)
     DO k=1,Nz
        WRITE(11,*) z(k)
     END DO
     CLOSE(11)
     PRINT *,'Saved data'
  END IF

  ! Calculate error in final numerical solution
  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     utemp(i,j,k)=u(i,j,k) -&
          (-0.5*( factor*cos(x(i))*sin(y(j))*sin(z(k))&
          +sin(x(i))*cos(y(j))*cos(z(k)) )*exp(-(factor**2)*time(Nt+1)/Re))
  END DO; END DO; END DO
  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     vtemp(i,j,k)=v(i,j,k) -&
          (0.5*(  factor*sin(x(i))*cos(y(j))*sin(z(k))&
          -cos(x(i))*sin(y(j))*cos(z(k)) )*exp(-(factor**2)*time(Nt+1)/Re))
  END DO; END DO ; END DO
  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     wtemp(i,j,k)=w(i,j,k)-&
          (cos(x(i))*cos(y(j))*sin(z(k))*exp(-(factor**2)*time(Nt+1)/Re))
  END DO; END DO ; END DO
  mychg(1) = maxval(abs(utemp))
  mychg(2) = maxval(abs(vtemp))
  mychg(3) = maxval(abs(wtemp))
  CALL MPI_ALLREDUCE(mychg,allchg,3,MPI_DOUBLE_PRECISION,MPI_MAX,MPI_COMM_WORLD,ierr)
  chg=allchg(1)+allchg(2)+allchg(3)
  IF (myid.eq.0) THEN
     PRINT*,'The error at the final timestep is',chg
  END IF

  ! clean up 
  CALL decomp_2d_fft_finalize
  CALL decomp_2d_finalize

  DEALLOCATE(x,y,z,time,mychg,allchg,u,v,w,ux,uy,uz,vx,vy,vz,wx,wy,wz,uold,uxold,uyold,uzold,&
       vold,vxold,vyold,vzold,wold,wxold,wyold,wzold,utemp,vtemp,wtemp,&
       temp_r,kx,ky,kz,uhat,vhat,what,rhsuhatfix,rhsvhatfix,&
       rhswhatfix,phat,nonlinuhat,nonlinvhat,nonlinwhat,temp_c,&
       realtemp,stat=AllocateStatus)         
  !coprocessing: deallocate Coprocessing specific arrays
  deallocate(realtempx, realtempy, realtempz, tempxhalo, tempyhalo, tempzhalo)
  IF (AllocateStatus .ne. 0) STOP
  IF (myid.eq.0) THEN
     PRINT *,'Program execution complete'
  END IF
  CALL MPI_FINALIZE(ierr)         

END PROGRAM main

f90 介面卡粘合程式碼中的程式碼更改

[編輯 | 編輯原始碼]

 

 

 

 

( E)
  • 更改介面卡以接受當前鉛筆的子範圍。

程式碼下載

! fortran module for interacting with the ParaView CoProcessor
! loosely based on: 
! ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/PhastaAdaptor/phastaNSadaptor.f
! -- Changed for SESE

! subroutine determines if coprocessing needed during the
! current simulation step or not, and if so, calls the coprocessor.
! some of the subroutines are supplied by ParaView's FortranAdaptorAPI,
! others have to be supplied by the programmer.      
module NSadaptor_module 
  use iso_c_binding
  implicit none
  public
  interface NSadaptor
    ! Originally also had a version that accepted 3D arrays, but in all other
    ! respects was identical. Decided this could lead to too much confusion.
    module procedure NSadaptor1D 
  end interface NSadaptor
contains
  subroutine NSadaptor1D(nx, ny, nz, xst, xen, yst, yen, zst, zen, &
                                     step, time, omegax, omegay, omegaz)
    ! nx, ny, nz     -- grid dimensions or entire mesh
    !                   used for setting whole extent
    ! xst, xen, etc. -- extents of current subdomain pencil
    ! step           -- current simulation time step
    ! time           -- current simulation time
    ! omega*          -- scalar array for the current time step
    ! flag           -- receives status from API calls
    integer, intent(in) :: nx, ny, nz, xst, xen, yst, yen, zst, zen, step
    real(kind=8), intent(in) :: time
    real(kind=8), dimension(:), intent (in) :: omegax, omegay, omegaz
    integer :: flag

    ! check if processing this time step
    ! defined in FortranAdaptorAPI.h
    call requestdatadescription(step, time, flag)
        
    if (flag /= 0) then
       ! processing requested
       ! check if need to create grid
       ! defined in FortranAdaptorAPI.h
       call needtocreategrid(flag)
       
       if (flag /= 0) then
          ! grid needed
          ! defined in VTKPointBasedDataSet.cxx
          ! takes the size of the entire grid, and the extents of the
          ! sub grid.
          call createcpimagedata(nx, ny, nz, xst, xen, yst, yen, zst, zen)
       end if
          
       ! defined in VTKPointBasedDataSet.cxx
       ! remember to null-terminate strings for C/C++
       call addfield(omegax, "realtempx"//char(0));
       call addfield(omegay, "realtempy"//char(0));
       call addfield(omegaz, "realtempz"//char(0));       

       ! defined in FortranAdaptorAPI.h
       call coprocess()
    end if
    
    return
  end subroutine NSadaptor1D 
end module NSadaptor_module

C++ 包裝程式碼中的程式碼更改

[編輯 | 編輯原始碼]

 

 

 

 

( F)
  • 指定了相對於整個資料的已分割槽資料範圍。
  • 指定了單元格的相對間距。

程式碼下載

// Adaptor for getting fortran simulation code into ParaView CoProcessor.
// Based on the PhastaAdaptor sample in the ParaView distribution.
// ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/PhastaAdaptor/PhastaAdaptor.cxx


// Fortran specific header
// ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/
#include "FortranAdaptorAPI.h" 

// CoProcessor specific headers
// Routines that take the place of VTK dataset object creation.
// Called from Fortran code which also calls the Fortran Adaptor API
// supplied with ParaView source.
// Note: names mangled with trailing underscores for linker visibility.
#include "vtkCPDataDescription.h"
#include "vtkCPInputDataDescription.h"
#include "vtkCPProcessor.h"
#include "vtkDoubleArray.h"
#include "vtkPointData.h"
#include "vtkSmartPointer.h"
#include "vtkImageData.h"

#include <string>

// These will be called from the Fortran "glue" code"
// Completely dependent on data layout, structured vs. unstructured, etc.
// since VTK/ParaView uses different internal layouts for each.

// Creates the data container for the CoProcessor.
// Takes the extents for both the global dataset and the particular subsection
// visible to the current MPI rank.
// Note: expects to receive Fortran base-1 indices.
extern "C" void createcpimagedata_(int* nx, int* ny, int* nz, int* xst, int* xen,
	int* yst, int* yen, int* zst, int* zen)
{
  if (!ParaViewCoProcessing::GetCoProcessorData()) {
    vtkGenericWarningMacro("Unable to access CoProcessorData.");
    return;
  }

  // The simulation grid is a 3-dimensional topologically and geometrically 
  // regular grid. In VTK/ParaView, this is considered an image data set.
  vtkSmartPointer<vtkImageData> img = vtkSmartPointer<vtkImageData>::New();
  img->SetExtent(*xst - 1, *xen - 1, *yst - 1, *yen - 1, *zst - 1, *zen - 1); 
  img->SetSpacing( 1.0 / *nx, 1.0 / *ny, 1.0 / *nz); 
  ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input")->SetGrid(img);
  ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input")->SetWholeExtent(0, *nx - 1, 0, *ny - 1, 0, *nz - 1);
}

// Add field(s) to the data container.
// Separate from above because this will be dynamic, grid is static.
// Might be an issue, VTK assumes row major C storage.
// Underscore is by-hand name mangling for fortran linking.
// Note: Expects the address of the data, has no way of determining
// if the array is densely packed or not.
// Note 2: Expects "name" to point to null-terminated character array, 
// be sure to null-terminate in caller.
extern "C" void addfield_(double* a, char* name)
{

  vtkSmartPointer<vtkCPInputDataDescription> idd = ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input");
  vtkSmartPointer<vtkImageData> img = vtkImageData::SafeDownCast(idd->GetGrid());

  if (!img) {
    vtkGenericWarningMacro("No adaptor grid to attach field data to.");
    return;
  }


  if (idd->IsFieldNeeded(name)) {
    vtkSmartPointer<vtkDoubleArray> field = vtkSmartPointer<vtkDoubleArray>::New();
    field->SetName(name);
    field->SetArray(a, img->GetNumberOfPoints(), 1); 
    img->GetPointData()->AddArray(field);
  }
}
  • 某些 VTK/ParaView 過濾器可以使用幽靈(光暈)單元格,程式碼可以擴充套件以處理這些單元格。
  • 在模擬程式碼本身中添加了很多程式碼。
  • 光暈和打包陣列需要額外的記憶體。
  • 為 C++ 重塑陣列的額外開銷。
  • 不同的分解庫可能需要更改光暈邏輯。
  • 不能與 3D 分解庫一起使用。

將模擬節點重新解釋為視覺化單元格

[編輯 | 編輯原始碼]

檢視資料的另一種方式是,模擬的每個節點都是視覺化域中的一個單元格。然後,當鉛筆傳送到 ParaView 時,就不會有缺失單元格的間隙。

模擬程式碼中的程式碼更改

[編輯 | 編輯原始碼]

 

 

 

 

( G)
  • 除了常規的 CoProcessing 呼叫之外,沒有其他更改。

程式碼下載

PROGRAM main    
  !-----------------------------------------------------------------------------------
  !
  !
  ! PURPOSE
  !
  ! This program numerically solves the 3D incompressible Navier-Stokes
  ! on a Cubic Domain [0,2pi]x[0,2pi]x[0,2pi] using pseudo-spectral methods and
  ! Implicit Midpoint rule timestepping. The numerical solution is compared to
  ! an exact solution reported by Shapiro 
  !
  ! Analytical Solution:
  !       u(x,y,z,t)=-0.25*(cos(x)sin(y)sin(z)+sin(x)cos(y)cos(z))exp(-t/Re)
  !       v(x,y,z,t)= 0.25*(sin(x)cos(y)sin(z)-cos(x)sin(y)cos(z))exp(-t/Re)
  !       w(x,y,z,t)= 0.5*cos(x)cos(y)sin(z)exp(-t/Re)
  !
  ! .. Parameters ..
  !  Nx                           = number of modes in x - power of 2 for FFT
  !  Ny                           = number of modes in y - power of 2 for FFT
  !  Nz                           = number of modes in z - power of 2 for FFT
  !  Nt                           = number of timesteps to take
  !  Tmax                         = maximum simulation time
  !  FFTW_IN_PLACE        = value for FFTW input 
  !  FFTW_MEASURE         = value for FFTW input
  !  FFTW_EXHAUSTIVE      = value for FFTW input
  !  FFTW_PATIENT         = value for FFTW input    
  !  FFTW_ESTIMATE        = value for FFTW input
  !  FFTW_FORWARD     = value for FFTW input
  !  FFTW_BACKWARD        = value for FFTW input  
  !  pi = 3.14159265358979323846264338327950288419716939937510d0
  !  Re                           = Reynolds number
  ! .. Scalars ..
  !  i                            = loop counter in x direction
  !  j                            = loop counter in y direction
  !  k                            = loop counter in z direction
  !  n                            = loop counter for timesteps direction  
  !  allocatestatus       = error indicator during allocation
  !  count                        = keep track of information written to disk
  !  iol                          = size of array to write to disk
  !  start                        = variable to record start time of program
  !  finish                       = variable to record end time of program
  !  count_rate           = variable for clock count rate
  !  planfxyz                     = Forward 3d fft plan 
  !  planbxyz                     = Backward 3d fft plan
  !  dt                           = timestep
  ! .. Arrays ..
  !  u                            = velocity in x direction
  !  v                            = velocity in y direction
  !  w                            = velocity in z direction
  !  uold                         = velocity in x direction at previous timestep
  !  vold                         = velocity in y direction at previous timestep
  !  wold                         = velocity in z direction at previous timestep
  !  ux                           = x derivative of velocity in x direction
  !  uy                           = y derivative of velocity in x direction
  !  uz                           = z derivative of velocity in x direction
  !  vx                           = x derivative of velocity in y direction
  !  vy                           = y derivative of velocity in y direction
  !  vz                           = z derivative of velocity in y direction
  !  wx                           = x derivative of velocity in z direction
  !  wy                           = y derivative of velocity in z direction
  !  wz                           = z derivative of velocity in z direction
  !  uxold                        = x derivative of velocity in x direction
  !  uyold                        = y derivative of velocity in x direction
  !  uzold                        = z derivative of velocity in x direction
  !  vxold                        = x derivative of velocity in y direction
  !  vyold                        = y derivative of velocity in y direction
  !  vzold                        = z derivative of velocity in y direction
  !  wxold                        = x derivative of velocity in z direction
  !  wyold                        = y derivative of velocity in z direction
  !  wzold                        = z derivative of velocity in z direction
  !  utemp                        = temporary storage of u to check convergence
  !  vtemp                        = temporary storage of u to check convergence
  !  wtemp                        = temporary storage of u to check convergence
  !  temp_r                       = temporary storage for untransformed variables
  !  uhat                         = Fourier transform of u
  !  vhat                         = Fourier transform of v
  !  what                         = Fourier transform of w
  !  rhsuhatfix           = Fourier transform of righthand side for u for timestepping
  !  rhsvhatfix           = Fourier transform of righthand side for v for timestepping
  !  rhswhatfix           = Fourier transform of righthand side for w for timestepping
  !  nonlinuhat           = Fourier transform of nonlinear term for u
  !  nonlinvhat           = Fourier transform of nonlinear term for u
  !  nonlinwhat           = Fourier transform of nonlinear term for u
  !  phat                         = Fourier transform of nonlinear term for pressure, p
  !  temp_c                       = temporary storage for Fourier transforms
  !  realtemp                     = Real storage
  !
  ! .. Vectors ..
  !  kx                           = fourier frequencies in x direction
  !  ky                           = fourier frequencies in y direction
  !  kz                           = fourier frequencies in z direction
  !  x                            = x locations
  !  y                            = y locations
  !  z                            = y locations
  !  time                         = times at which save data
  !  name_config          = array to store filename for data to be saved
  !               
  ! REFERENCES
  !
  ! A. Shapiro " The use of an exact solution of the Navier-Stokes equations 
  ! in a validation test of a three-dimensional nonhydrostatic numerical model"
  ! Monthly Weather Review vol. 121, 2420-2425, (1993).
  !
  ! ACKNOWLEDGEMENTS
  !
  ! ACCURACY
  !               
  ! ERROR INDICATORS AND WARNINGS
  !
  ! FURTHER COMMENTS
  !
  ! This program has not been optimized to use the least amount of memory
  ! but is intended as an example only for which all states can be saved
  !
  !--------------------------------------------------------------------------------
  ! External routines required
  ! 
  ! External libraries required
  ! 2DECOMP&FFT -- Fast Fourier Transform in the West Library
  !                       (http://2decomp.org/)


  !---------------------------------------------------------------------------------

  USE decomp_2d
  USE decomp_2d_fft
  USE decomp_2d_io
  USE MPI
  !coprocessing:
  use NSadaptor_module 
  IMPLICIT NONE   
  ! declare variables
  INTEGER(kind=4), PARAMETER              :: Nx=256
  INTEGER(kind=4), PARAMETER              :: Ny=256
  INTEGER(kind=4), PARAMETER              :: Nz=256
  INTEGER(kind=4), PARAMETER              :: Lx=1
  INTEGER(kind=4), PARAMETER              :: Ly=1
  INTEGER(kind=4), PARAMETER              :: Lz=1
  INTEGER(kind=4), PARAMETER              :: Nt=20
  REAL(kind=8), PARAMETER                 :: dt=0.05d0/Nt
  REAL(kind=8), PARAMETER                 :: Re=1.0d0     
  REAL(kind=8), PARAMETER                 :: tol=0.1d0**10
  REAL(kind=8), PARAMETER                 :: theta=0.0d0

  REAL(kind=8), PARAMETER &
       ::  pi=3.14159265358979323846264338327950288419716939937510d0
  REAL(kind=8), PARAMETER         ::      ReInv=1.0d0/REAL(Re,kind(0d0))
  REAL(kind=8), PARAMETER         ::  dtInv=1.0d0/REAL(dt,kind(0d0)) 
  REAL(kind=8)                            :: scalemodes,chg,factor
  REAL(kind=8), DIMENSION(:), ALLOCATABLE                 :: x, y, z, time,mychg,allchg
  COMPLEX(kind=8), DIMENSION(:,:,:), ALLOCATABLE  :: u, v, w,&
       ux, uy, uz,&
       vx, vy, vz,&
       wx, wy, wz,&
       uold, uxold, uyold, uzold,&
       vold, vxold, vyold, vzold,&
       wold, wxold, wyold, wzold,&
       utemp, vtemp, wtemp, temp_r

  COMPLEX(kind=8), DIMENSION(:), ALLOCATABLE      ::      kx, ky, kz                                              
  COMPLEX(kind=8), DIMENSION(:,:,:), ALLOCATABLE  ::      uhat, vhat, what,&
       rhsuhatfix, rhsvhatfix,&
       rhswhatfix, nonlinuhat,&
       nonlinvhat, nonlinwhat,&
       phat,temp_c
  !coprocessing: added x,y,z component arrays 
  REAL(kind=8), DIMENSION(:,:,:), ALLOCATABLE     ::  realtemp, realtempx, &
       realtempy, realtempz
  ! MPI and 2DECOMP variables
  TYPE(DECOMP_INFO)                               ::  decomp
  INTEGER(kind=MPI_OFFSET_KIND)                   ::  filesize, disp
  INTEGER(kind=4)                                 ::  p_row=0, p_col=0, numprocs, myid, ierr      

  ! variables used for saving data and timing
  INTEGER(kind=4)                                 :: count, iol 
  INTEGER(kind=4)                                 :: i,j,k,n,t,allocatestatus
  INTEGER(kind=4)                                 :: ind, numberfile
  CHARACTER*100                                   :: name_config
  INTEGER(kind=4)                                 :: start, finish, count_rate 

  ! initialisation of 2DECOMP&FFT
  CALL MPI_INIT(ierr)
  CALL MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)
  CALL MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr) 
  ! do automatic domain decomposition
  CALL decomp_2d_init(Nx,Ny,Nz,p_row,p_col)
  ! get information about domain decomposition choosen
  CALL decomp_info_init(Nx,Ny,Nz,decomp)
  ! initialise FFT library
  CALL decomp_2d_fft_init
  IF (myid.eq.0) THEN
     PRINT *,'Grid:',Nx,'X',Ny,'Y',Nz,'Z'
     PRINT *,'dt:',dt
  END IF

  ALLOCATE(x(1:Nx),y(1:Ny),z(1:Nz),time(1:Nt+1),mychg(1:3),allchg(1:3),&
       u(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),& 
       v(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       w(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       ux(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       uy(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       uz(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       vx(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       vy(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       vz(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       wx(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       wy(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       wz(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       uold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       uxold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       uyold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       uzold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       vold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       vxold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       vyold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       vzold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       wold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       wxold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       wyold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       wzold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       utemp(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       vtemp(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       wtemp(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       temp_r(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       kx(1:Nx),ky(1:Ny),kz(1:Nz),&
       uhat(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       vhat(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       what(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       rhsuhatfix(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       rhsvhatfix(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       rhswhatfix(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       nonlinuhat(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       nonlinvhat(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       nonlinwhat(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       phat(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       temp_c(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       realtemp(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),& 
       realtempx(decomp%xst(1):decomp%xen(1), &
       decomp%xst(2):decomp%xen(2), &
       decomp%xst(3):decomp%xen(3)),&
       realtempy(decomp%xst(1):decomp%xen(1), &
       decomp%xst(2):decomp%xen(2), &
       decomp%xst(3):decomp%xen(3)),&
       realtempz(decomp%xst(1):decomp%xen(1), &
       decomp%xst(2):decomp%xen(2), &
       decomp%xst(3):decomp%xen(3)),&
       stat=AllocateStatus)      
  IF (AllocateStatus .ne. 0) STOP
  IF (myid.eq.0) THEN
     PRINT *,'allocated space'
  END IF

  ! setup fourier frequencies in x-direction
  DO i=1,Nx/2+1
     kx(i)= cmplx(0.0d0,1.0d0)*REAL(i-1,kind(0d0))/Lx                        
  END DO
  kx(1+Nx/2)=0.0d0
  DO i = 1,Nx/2 -1
     kx(i+1+Nx/2)=-kx(1-i+Nx/2)
  END DO
  ind=1
  DO i=-Nx/2,Nx/2-1
     x(ind)=2.0d0*pi*REAL(i,kind(0d0))*Lx/REAL(Nx,kind(0d0))
     ind=ind+1
  END DO
  ! setup fourier frequencies in y-direction
  DO j=1,Ny/2+1
     ky(j)= cmplx(0.0d0,1.0d0)*REAL(j-1,kind(0d0))/Ly                        
  END DO
  ky(1+Ny/2)=0.0d0
  DO j = 1,Ny/2 -1
     ky(j+1+Ny/2)=-ky(1-j+Ny/2)
  END DO
  ind=1
  DO j=-Ny/2,Ny/2-1
     y(ind)=2.0d0*pi*REAL(j,kind(0d0))*Ly/REAL(Ny,kind(0d0))
     ind=ind+1
  END DO
  ! setup fourier frequencies in z-direction
  DO k=1,Nz/2+1
     kz(k)= cmplx(0.0d0,1.0d0)*REAL(k-1,kind(0d0))/Lz                        
  END DO
  kz(1+Nz/2)=0.0d0
  DO k = 1,Nz/2 -1
     kz(k+1+Nz/2)=-kz(1-k+Nz/2)
  END DO
  ind=1
  DO k=-Nz/2,Nz/2-1
     z(ind)=2.0d0*pi*REAL(k,kind(0d0))*Lz/REAL(Nz,kind(0d0))
     ind=ind+1
  END DO
  scalemodes=1.0d0/REAL(Nx*Ny*Nz,kind(0d0))
  IF (myid.eq.0) THEN
     PRINT *,'Setup grid and fourier frequencies'
  END IF

  !initial conditions for Taylor-Green vortex
  !       factor=2.0d0/sqrt(3.0d0)
  !       DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
  !               u(i,j,k)=factor*sin(theta+2.0d0*pi/3.0d0)*sin(x(i))*cos(y(j))*cos(z(k))
  !       END DO; END DO; END DO
  !       DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
  !               v(i,j,k)=factor*sin(theta-2.0d0*pi/3.0d0)*cos(x(i))*sin(y(j))*cos(z(k))
  !       END DO ; END DO ; END DO
  !       DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
  !               w(i,j,k)=factor*sin(theta)*cos(x(i))*cos(y(j))*sin(z(k))
  !       END DO ; END DO ; END DO

  time(1)=0.0d0
  factor=sqrt(3.0d0)
  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     u(i,j,k)=-0.5*( factor*cos(x(i))*sin(y(j))*sin(z(k))&
          +sin(x(i))*cos(y(j))*cos(z(k)) )*exp(-(factor**2)*time(1)/Re)
  END DO; END DO; END DO
  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     v(i,j,k)=0.5*(  factor*sin(x(i))*cos(y(j))*sin(z(k))&
          -cos(x(i))*sin(y(j))*cos(z(k)) )*exp(-(factor**2)*time(1)/Re)
  END DO; END DO ; END DO
  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     w(i,j,k)=cos(x(i))*cos(y(j))*sin(z(k))*exp(-(factor**2)*time(1)/Re)
  END DO; END DO ; END DO

  CALL decomp_2d_fft_3d(u,uhat,DECOMP_2D_FFT_FORWARD)
  CALL decomp_2d_fft_3d(v,vhat,DECOMP_2D_FFT_FORWARD)
  CALL decomp_2d_fft_3d(w,what,DECOMP_2D_FFT_FORWARD)


  ! derivative of u with respect to x, y, and z 
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=uhat(i,j,k)*kx(i)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,ux,DECOMP_2D_FFT_BACKWARD) 
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=uhat(i,j,k)*ky(j)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,uy,DECOMP_2D_FFT_BACKWARD) 
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=uhat(i,j,k)*kz(k)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,uz,DECOMP_2D_FFT_BACKWARD) 

  ! derivative of v with respect to x, y, and z 
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=vhat(i,j,k)*kx(i)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,vx,DECOMP_2D_FFT_BACKWARD)         
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=vhat(i,j,k)*ky(j)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,vy,DECOMP_2D_FFT_BACKWARD) 
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=vhat(i,j,k)*kz(k)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,vz,DECOMP_2D_FFT_BACKWARD)         

  ! derivative of w with respect to x, y, and z 
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=what(i,j,k)*kx(i)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,wx,DECOMP_2D_FFT_BACKWARD)         
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=what(i,j,k)*ky(j)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,wy,DECOMP_2D_FFT_BACKWARD)         
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=what(i,j,k)*kz(k)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,wz,DECOMP_2D_FFT_BACKWARD)         

  ! save initial data
  n = 0
  !coprocessing: removed savedata calls from coprocessing version
  !coprocessing: could have this after the coprocessor initialize and
  !coprocessing: followed by a call to the coprocessor to write out the initial data
  !coprocessing: image. Otherwise, these are just wasted calls at this spot.
  call coprocessorinitialize("pipeline.py", 11)

  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     realtempx(i,j,k)=REAL(wy(i,j,k)-vz(i,j,k),KIND=8)
  END DO; END DO ; END DO

  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     realtempy(i,j,k)=REAL(uz(i,j,k)-wx(i,j,k),KIND=8)
  END DO; END DO ; END DO

  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     realtempz(i,j,k)=REAL(vx(i,j,k)-uy(i,j,k),KIND=8)
  END DO; END DO ; END DO

  call NSadaptor(Nx,Ny,Nz,decomp%xst(1),decomp%xen(1), &
          decomp%xst(2),decomp%xen(2),decomp%xst(3),decomp%xen(3), n, n*dt, &
          realtempx, realtempy, realtempz)

  !start timer
  CALL system_clock(start,count_rate)
  ! coprocessing: Simulation loop starts here
  DO n=1,Nt
     !fixed point
     DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
        uold(i,j,k)=u(i,j,k)
        uxold(i,j,k)=ux(i,j,k)
        uyold(i,j,k)=uy(i,j,k)
        uzold(i,j,k)=uz(i,j,k)
     END DO; END DO ; END DO
     DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
        vold(i,j,k)=v(i,j,k)
        vxold(i,j,k)=vx(i,j,k)
        vyold(i,j,k)=vy(i,j,k)
        vzold(i,j,k)=vz(i,j,k)
     END DO; END DO ; END DO
     DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
        wold(i,j,k)=w(i,j,k)
        wxold(i,j,k)=wx(i,j,k)
        wyold(i,j,k)=wy(i,j,k)
        wzold(i,j,k)=wz(i,j,k)
     END DO; END DO ; END DO
     DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
        rhsuhatfix(i,j,k) = (dtInv+(0.5*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k)))*uhat(i,j,k) 
     END DO; END DO ; END DO
     DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
        rhsvhatfix(i,j,k) = (dtInv+(0.5*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k)))*vhat(i,j,k) 
     END DO; END DO ; END DO
     DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
        rhswhatfix(i,j,k) = (dtInv+(0.5*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k)))*what(i,j,k) 
     END DO; END DO ; END DO

     chg=1
     DO WHILE (chg .gt. tol)
        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           temp_r(i,j,k)=0.25d0*((u(i,j,k)+uold(i,j,k))*(ux(i,j,k)+uxold(i,j,k))&
                +(v(i,j,k)+vold(i,j,k))*(uy(i,j,k)+uyold(i,j,k))&
                +(w(i,j,k)+wold(i,j,k))*(uz(i,j,k)+uzold(i,j,k)))
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_r,nonlinuhat,DECOMP_2D_FFT_FORWARD)
        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           temp_r(i,j,k)=0.25d0*((u(i,j,k)+uold(i,j,k))*(vx(i,j,k)+vxold(i,j,k))&
                +(v(i,j,k)+vold(i,j,k))*(vy(i,j,k)+vyold(i,j,k))&
                +(w(i,j,k)+wold(i,j,k))*(vz(i,j,k)+vzold(i,j,k)))
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_r,nonlinvhat,DECOMP_2D_FFT_FORWARD)
        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           temp_r(i,j,k)=0.25d0*((u(i,j,k)+uold(i,j,k))*(wx(i,j,k)+wxold(i,j,k))&
                +(v(i,j,k)+vold(i,j,k))*(wy(i,j,k)+wyold(i,j,k))&
                +(w(i,j,k)+wold(i,j,k))*(wz(i,j,k)+wzold(i,j,k)))
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_r,nonlinwhat,DECOMP_2D_FFT_FORWARD)
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           phat(i,j,k)=-1.0d0*( kx(i)*nonlinuhat(i,j,k)&
                +ky(j)*nonlinvhat(i,j,k)&
                +kz(k)*nonlinwhat(i,j,k))&
                /(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k)+0.1d0**13)
        END DO; END DO ; END DO

        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           uhat(i,j,k)=(rhsuhatfix(i,j,k)-nonlinuhat(i,j,k)-kx(i)*phat(i,j,k))/&
                (dtInv-(0.5d0*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k))) !*scalemodes
        END DO; END DO ; END DO
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           vhat(i,j,k)=(rhsvhatfix(i,j,k)-nonlinvhat(i,j,k)-ky(j)*phat(i,j,k))/&
                (dtInv-(0.5d0*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k))) !*scalemodes
        END DO; END DO ; END DO
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           what(i,j,k)=(rhswhatfix(i,j,k)-nonlinwhat(i,j,k)-kz(k)*phat(i,j,k))/&
                (dtInv-(0.5d0*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k))) !*scalemodes
        END DO; END DO ; END DO

        ! derivative of u with respect to x, y, and z 
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=uhat(i,j,k)*kx(i)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,ux,DECOMP_2D_FFT_BACKWARD) 
        DO k=decomp%zst(3),decomp%zen(3); DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=uhat(i,j,k)*ky(j)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,uy,DECOMP_2D_FFT_BACKWARD) 
        DO k=decomp%zst(3),decomp%zen(3); DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=uhat(i,j,k)*kz(k)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,uz,DECOMP_2D_FFT_BACKWARD) 

        ! derivative of v with respect to x, y, and z 
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=vhat(i,j,k)*kx(i)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,vx,DECOMP_2D_FFT_BACKWARD) 
        DO k=decomp%zst(3),decomp%zen(3); DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=vhat(i,j,k)*ky(j)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,vy,DECOMP_2D_FFT_BACKWARD) 
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=vhat(i,j,k)*kz(k)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,vz,DECOMP_2D_FFT_BACKWARD) 

        ! derivative of w with respect to x, y, and z 
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=what(i,j,k)*kx(i)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,wx,DECOMP_2D_FFT_BACKWARD) 
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=what(i,j,k)*ky(j)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,wy,DECOMP_2D_FFT_BACKWARD) 
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=what(i,j,k)*kz(k)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,wz,DECOMP_2D_FFT_BACKWARD) 

        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           utemp(i,j,k)=u(i,j,k)
        END DO; END DO ; END DO
        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           vtemp(i,j,k)=v(i,j,k)
        END DO; END DO ; END DO
        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           wtemp(i,j,k)=w(i,j,k)
        END DO; END DO ; END DO

        CALL decomp_2d_fft_3d(uhat,u,DECOMP_2D_FFT_BACKWARD)    
        CALL decomp_2d_fft_3d(vhat,v,DECOMP_2D_FFT_BACKWARD)    
        CALL decomp_2d_fft_3d(what,w,DECOMP_2D_FFT_BACKWARD)    

        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           u(i,j,k)=u(i,j,k)*scalemodes
        END DO; END DO ; END DO
        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           v(i,j,k)=v(i,j,k)*scalemodes
        END DO; END DO ; END DO
        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           w(i,j,k)=w(i,j,k)*scalemodes
        END DO; END DO ; END DO

        mychg(1) =maxval(abs(utemp-u))
        mychg(2) =maxval(abs(vtemp-v))
        mychg(3) =maxval(abs(wtemp-w))
        CALL MPI_ALLREDUCE(mychg,allchg,3,MPI_DOUBLE_PRECISION,MPI_MAX,MPI_COMM_WORLD,ierr)
        chg=allchg(1)+allchg(2)+allchg(3)
        IF (myid.eq.0) THEN
           PRINT *,'chg:',chg
        END IF
     END DO
     time(n+1)=n*dt

     !goto 5100
     IF (myid.eq.0) THEN     
        PRINT *,'time',n*dt
     END IF

     !coprocessing: populating the arrays sent to the coprocessor
     DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
        realtempx(i,j,k)=REAL(wy(i,j,k)-vz(i,j,k),KIND=8)
     END DO; END DO ; END DO

     DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
        realtempy(i,j,k)=REAL(uz(i,j,k)-wx(i,j,k),KIND=8)
     END DO; END DO ; END DO

     DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
        realtempz(i,j,k)=REAL(vx(i,j,k)-uy(i,j,k),KIND=8)
     END DO; END DO ; END DO

     call NSadaptor(Nx,Ny,Nz,decomp%xst(1),decomp%xen(1), &
             decomp%xst(2),decomp%xen(2),decomp%xst(3),decomp%xen(3), n, n*dt, &
             realtempx, realtempy, realtempz) 
  END DO
  !coprocessing:
  call coprocessorfinalize()

  CALL system_clock(finish,count_rate)

  IF (myid.eq.0) then
     PRINT *, 'Program took', REAL(finish-start)/REAL(count_rate), 'for main timestepping loop'
  END IF

  IF (myid.eq.0) THEN
     name_config = './data/tdata.dat' 
     OPEN(unit=11,FILE=name_config,status="UNKNOWN")         
     REWIND(11)
     DO n=1,1+Nt
        WRITE(11,*) time(n)
     END DO 
     CLOSE(11)

     name_config = './data/xcoord.dat' 
     OPEN(unit=11,FILE=name_config,status="UNKNOWN")         
     REWIND(11)
     DO i=1,Nx
        WRITE(11,*) x(i)
     END DO
     CLOSE(11)       

     name_config = './data/ycoord.dat' 
     OPEN(unit=11,FILE=name_config,status="UNKNOWN")         
     REWIND(11)
     DO j=1,Ny
        WRITE(11,*) y(j)
     END DO
     CLOSE(11)

     name_config = './data/zcoord.dat' 
     OPEN(unit=11,FILE=name_config,status="UNKNOWN")         
     REWIND(11)
     DO k=1,Nz
        WRITE(11,*) z(k)
     END DO
     CLOSE(11)
     PRINT *,'Saved data'
  END IF

  ! Calculate error in final numerical solution
  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     utemp(i,j,k)=u(i,j,k) -&
          (-0.5*( factor*cos(x(i))*sin(y(j))*sin(z(k))&
          +sin(x(i))*cos(y(j))*cos(z(k)) )*exp(-(factor**2)*time(Nt+1)/Re))
  END DO; END DO; END DO
  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     vtemp(i,j,k)=v(i,j,k) -&
          (0.5*(  factor*sin(x(i))*cos(y(j))*sin(z(k))&
          -cos(x(i))*sin(y(j))*cos(z(k)) )*exp(-(factor**2)*time(Nt+1)/Re))
  END DO; END DO ; END DO
  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     wtemp(i,j,k)=w(i,j,k)-&
          (cos(x(i))*cos(y(j))*sin(z(k))*exp(-(factor**2)*time(Nt+1)/Re))
  END DO; END DO ; END DO
  mychg(1) = maxval(abs(utemp))
  mychg(2) = maxval(abs(vtemp))
  mychg(3) = maxval(abs(wtemp))
  CALL MPI_ALLREDUCE(mychg,allchg,3,MPI_DOUBLE_PRECISION,MPI_MAX,MPI_COMM_WORLD,ierr)
  chg=allchg(1)+allchg(2)+allchg(3)
  IF (myid.eq.0) THEN
     PRINT*,'The error at the final timestep is',chg
  END IF

  ! clean up 
  CALL decomp_2d_fft_finalize
  CALL decomp_2d_finalize

  DEALLOCATE(x,y,z,time,mychg,allchg,u,v,w,ux,uy,uz,vx,vy,vz,wx,wy,wz,uold,uxold,uyold,uzold,&
       vold,vxold,vyold,vzold,wold,wxold,wyold,wzold,utemp,vtemp,wtemp,&
       temp_r,kx,ky,kz,uhat,vhat,what,rhsuhatfix,rhsvhatfix,&
       rhswhatfix,phat,nonlinuhat,nonlinvhat,nonlinwhat,temp_c,&
       realtemp,stat=AllocateStatus)         
  !coprocessing: deallocate Coprocessing specific arrays
  deallocate(realtempx, realtempy, realtempz)
  IF (AllocateStatus .ne. 0) STOP
  IF (myid.eq.0) THEN
     PRINT *,'Program execution complete'
  END IF
  CALL MPI_FINALIZE(ierr)         

END PROGRAM main

f90 介面卡粘合程式碼中的程式碼更改

[編輯 | 編輯原始碼]

 

 

 

 

( H)
  • 接受當前鉛筆的子範圍。

程式碼下載

! Fortran module for interacting with the ParaView CoProcessor
! loosely based on: 
! ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/PhastaAdaptor/phastaadaptor.f
! -- Changed for SESE

! Subroutine determines if coprocessing needed during the
! current simulation step or not, and if so, calls the coprocessor.
! Some of the subroutines are supplied by ParaView's FortranAdaptorAPI,
! others have to be supplied by the programmer.      
module NSadaptor_module 
  use iso_c_binding
  implicit none
  public
  interface NSadaptor 
    module procedure NSadaptor3D 
  end interface NSadaptor 
contains
  subroutine NSadaptor3D(nx, ny, nz, xst, xen, yst, yen, zst, zen, &
                                     step, time, omegax, omegay, omegaz)
    ! nx, ny, nz     -- grid dimensions of entire mesh
    !                   used for setting whole extent
    ! xst, xen, etc. -- extents of current subdomain pencil
    ! step           -- current simulation time step
    ! time           -- current simulation time
    ! omega*         -- scalar arrays for the current time step
    integer, intent(in) :: nx, ny, nz, xst, xen, yst, yen, zst, zen, step
    real(kind=8), intent(in) :: time
    real(kind=8), dimension(:,:,:), intent (in) :: omegax, omegay, omegaz
    integer :: flag

    ! check if processing this time step
    ! defined in FortranAdaptorAPI.h
    call requestdatadescription(step, time, flag)
        
    if (flag /= 0) then
       ! processing requested
       ! check if need to create grid
       ! defined in FortranAdaptorAPI.h
       call needtocreategrid(flag)
       
       if (flag /= 0) then
          ! grid needed
          ! defined in VTKCellBasedDataSet.cxx
          ! takes the size of the entire grid, and the extents of the
          ! sub grid.
          call createcpimagedata(nx, ny, nz, xst, xen, yst, yen, zst, zen)
       end if
          
       ! defined in VTKCellBasedDataSet.cxx 
       ! call for each field of interest. Be sure to null-terminate the
       ! string for C++!
       call addfield(omegax, "realtempx"//char(0))
       call addfield(omegay, "realtempy"//char(0))
       call addfield(omegaz, "realtempz"//char(0))       


       ! defined in FortranAdaptorAPI.h
       call coprocess()
    end if
    
    return
  end subroutine NSadaptor3D 
end module NSadaptor_module

C++ 包裝程式碼中的程式碼更改

[編輯 | 編輯原始碼]

 

 

 

 

( I)
  • 更改了範圍的計算方式。
  • 將點資料呼叫更改為單元格資料呼叫。

程式碼下載

// Adaptor for getting fortran simulation code into ParaView CoProcessor.
// Based on the PhastaAdaptor sample in the ParaView distribution.
// ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/PhastaAdaptor/PhastaAdaptor.cxx


// Fortran specific header
// ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/
#include "FortranAdaptorAPI.h" 

// CoProcessor specific headers
// Routines that take the place of VTK dataset object creation.
// Called from Fortran code which also calls the Fortran Adaptor API
// supplied with ParaView source.
// Note: names mangled with trailing underscores for linker visibility.
#include "vtkCPDataDescription.h"
#include "vtkCPInputDataDescription.h"
#include "vtkCPProcessor.h"
#include "vtkDoubleArray.h"
#include "vtkPointData.h"
#include "vtkSmartPointer.h"
#include "vtkImageData.h"
#include "vtkCellData.h"
#include <string>

// These will be called from the Fortran adaptor "glue" code.
// Completely dependent on data layout, structured vs. unstructured, etc.
// since VTK/ParaView uses different internal layouts for each.

// Creates the data container for the CoProcessor.
// Takes the extents for both the global dataset and the particular subsection
// visible to the current MPI rank.
// Note: expects to receive Fortran base-1 indices.
extern "C" void createcpimagedata_(int* nx, int* ny, int* nz, int* xst, int* xen,
	int* yst, int* yen, int* zst, int* zen)
{
  if (!ParaViewCoProcessing::GetCoProcessorData()) {
    vtkGenericWarningMacro("Unable to access CoProcessorData.");
    return;
  }

  // The simulation grid is a 3-dimensional topologically and geometrically 
  // regular grid. In VTK/ParaView, this is considered an image data set.
  vtkSmartPointer<vtkImageData> img = vtkSmartPointer<vtkImageData>::New();
  // Indexing based on change from F90 to C++, and also from nodal to cellular.
  img->SetExtent(*xst - 1, *xen, *yst - 1, *yen, *zst - 1, *zen); 
  img->SetSpacing( 1.0 / *nx, 1.0 / *ny, 1.0 / *nz);
 
  ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input")->SetGrid(img);
  ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input")->SetWholeExtent(0, *nx, 0, *ny, 0, *nz);
}

// Add field to the data container.
// Separate from above because this could be dynamic, grid is static.
// Might be an issue, VTK assumes row major C storage.
// Underscore is by-hand name mangling for fortran linking.
// Note: Expects the address of the data, has no way of determining
// if the array is densely packed or not.
// Note 2: Assumes "name" points to null-terminated array of chars.
// Easiest way to do that is to concatenate a terminal NULL in caller. 
extern "C" void addfield_(double* a, char* name)
{

  vtkSmartPointer<vtkCPInputDataDescription> idd = ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input");
  vtkSmartPointer<vtkImageData> img = vtkImageData::SafeDownCast(idd->GetGrid());

  if (!img) {
    vtkGenericWarningMacro("No adaptor grid to attach field data to.");
    return;
  }

  if (idd->IsFieldNeeded(name)) {
    vtkSmartPointer<vtkDoubleArray> field = vtkSmartPointer<vtkDoubleArray>::New();
    field->SetName(name);
    field->SetArray(a, img->GetNumberOfCells(), 1); 
    img->GetCellData()->AddArray(field);
  }
}
  • 最小的程式碼更改,模擬程式碼中沒有更改(除了協處理呼叫)。
  • 不需要光暈或打包陣列。
  • 應該可以在無需更改的情況下使用不同的分解庫。
  • 需要在 ParaView 客戶端中進行額外處理,以處理單元格渲染和點渲染之間的差異。
  • 不能與需要幽靈單元格的 VTK/ParaView 過濾器一起使用。

在 ParaView 客戶端中建立管道的其他注意事項

[編輯 | 編輯原始碼]

使用 Fortran 複數型別

[編輯 | 編輯原始碼]

C++ 沒有本機複數資料型別。這需要在 ParaView 客戶端中進行一些額外的處理

  1. 以減少的網格大小執行原始的非協處理模擬。
  2. 使用 RAW 二進位制閱讀器在 ParaView 客戶端中開啟此資料。
  3. 將標量分量數設定為 2
  4. 建立管道,實部和虛部的預設名稱分別是 ImageFile_X 和 ImageFile_Y。

 

 

 

 

( J)

有關將複數傳遞給協處理器的詳細資訊,請參見 NLSadaptor.f90 的程式碼。 程式碼下載

! Fortran module for interacting with the ParaView CoProcessor
! loosely based on: 
! ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/PhastaAdaptor/phastaadaptor.f
! -- Changed for SESE

! Subroutine determines if coprocessing needed during the
! current simulation step or not, and if so, calls the coprocessor.
! Some of the subroutines are supplied by ParaView's FortranAdaptorAPI,
! others have to be supplied by the programmer.      
module NLSadaptor_module 
  use iso_c_binding
  implicit none
  public
  ! using an interface incase need arises to overload 1D, 2D versions
  interface NLSadaptor 
    module procedure NLSadaptor3D 
  end interface NLSadaptor 
contains
  subroutine NLSadaptor3D(nx, ny, nz, xst, xen, yst, yen, zst, zen, &
                                     step, time, a)
    ! nx, ny, nz     -- grid dimensions of entire mesh
    !                   used for setting whole extent
    ! xst, xen, etc. -- extents of current subdomain pencil
    ! step           -- current simulation time step
    ! time           -- current simulation time
    ! a              -- scalar array for the current time step
    ! flag           -- receives status from API calls
    integer, intent(in) :: nx, ny, nz, xst, xen, yst, yen, zst, zen, step
    real(kind=8), intent(in) :: time
    complex(kind=8), dimension(:,:,:), intent (in) :: a 
    integer :: flag

    flag = 0
    ! check if processing this time step
    ! defined in FortranAdaptorAPI.h
    call requestdatadescription(step, time, flag)
        
    if (flag /= 0) then
       ! processing requested
       ! check if need to create grid
       ! defined in FortranAdaptorAPI.h
       call needtocreategrid(flag)
       
       if (flag /= 0) then
          ! grid needed
          ! defined in adaptor.cxx
          ! takes the size of the entire grid, and the extents of the
          ! sub grid.
          call createcpimagedata(nx, ny, nz, xst, xen, yst, yen, zst, zen)
       end if
          
       ! defined in adaptor.cxx
       ! Be sure to null-terminate the
       ! string for C++!
       call addfield(a, "u_complex"//char(0))

       ! defined in FortranAdaptorAPI.h
       call coprocess()
    end if
    
    return
  end subroutine NLSadaptor3D 
end module NLSadaptor_module

 

 

 

 

 

( K )

這也需要在附帶的 C++ 檔案中進行一些小的程式碼更改。 程式碼下載

// 介面卡,用於將 fortran 模擬程式碼輸入 ParaView 協處理。 // 基於 ParaView 發行版中的 PhastaAdaptor 示例。 // ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/PhastaAdaptor/PhastaAdaptor.cxx


// Fortran 特定標頭 // ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/

  1. include "FortranAdaptorAPI.h"

// 協處理特定標頭 // 替代 VTK 資料集物件建立的例程。 // 從 Fortran 程式碼呼叫,該程式碼還呼叫與 ParaView 原始碼一起提供的 Fortran Adaptor API // 注意:名稱與尾部下劃線混合,用於連結器可見性。

  1. include "vtkCPDataDescription.h"
  2. include "vtkCPInputDataDescription.h"
  3. include "vtkCPProcessor.h"
  4. include "vtkDoubleArray.h"
  5. include "vtkPointData.h"
  6. include "vtkSmartPointer.h"
  7. include "vtkImageData.h"
  8. include "vtkCellData.h"
  9. include <string>

// 這些將從 Fortran “粘合” 程式碼呼叫 // 完全依賴於資料佈局,結構化與非結構化,等等 // 因為 VTK/ParaView 對每個佈局使用不同的內部佈局。

// 為協處理器建立資料容器。 // 獲取全域性資料集和當前 MPI 秩可見的特定子部分的範圍 // 注意:預期接收 Fortran 基於 1 的索引。 extern "C" void createcpimagedata_(int* nx, int* ny, int* nz, int* xst, int* xen, int* yst, int* yen, int* zst, int* zen) {

 if (!ParaViewCoProcessing::GetCoProcessorData()) {
   vtkGenericWarningMacro("Unable to access CoProcessorData.");
   return;
 }
 // The simulation grid is a 3-dimensional topologically and geometrically 
 // regular grid. In VTK/ParaView, this is considered an image data set.
 vtkSmartPointer<vtkImageData> img = vtkSmartPointer<vtkImageData>::New();
 // For Fortram complex, need to set that there are 2 scalar components
 img->SetNumberOfScalarComponents(2);
 // Indexing based on change from F90 to C++, and also from nodal to cellular.
 // Extents are given in terms of nodes, not cells.
 img->SetExtent(*xst - 1, *xen, *yst - 1, *yen, *zst - 1, *zen); 
 
 // Setting spacing is important so that the camera position in the pipeline makes
 // sense if using different sized meshes between setting up the pipeline and running
 // the simulation. Origin can often be ignored.
 img->SetSpacing( 1.0 / *nx, 1.0 / *ny, 1.0 / *nz); // considering passing in as args. 
 ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input")->SetGrid(img);
 ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input")->SetWholeExtent(0, *nx, 0, *ny, 0, *nz);

}

// 向資料容器新增欄位。 // 與上述分開,因為這可能是動態的,網格是靜態的。 // 可能會出現問題,VTK 假設行主序 C 儲存。 // 下劃線是用於 fortran 連結的手動名稱混合。 // 注意:預期接收資料的地址,無法確定 // 陣列是否密集打包。 // 注意 2:假設 “name” 指向以 null 結尾的字元陣列。 // 最簡單的實現方式是在呼叫者中進行連線。 extern "C" void addfield_(double* a, char* name) {

 vtkSmartPointer<vtkCPInputDataDescription> idd = ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input");
 vtkSmartPointer<vtkImageData> img = vtkImageData::SafeDownCast(idd->GetGrid());
 if (!img) {
   vtkGenericWarningMacro("No adaptor grid to attach field data to.");
   return;
 }
 if (idd->IsFieldNeeded(name)) {
   vtkSmartPointer<vtkDoubleArray> field = vtkSmartPointer<vtkDoubleArray>::New();
   field->SetNumberOfComponents(2);
   field->SetName(name);
   field->SetArray(a, 2* img->GetNumberOfCells(), 1); 
   img->GetCellData()->AddArray(field);
 }

}

使用單元格資料

[編輯 | 編輯原始碼]
  • 某些 ParaView 過濾器和表示形式僅適用於點資料,例如體積渲染。使用單元格資料到點資料過濾器進行轉換。請務必選中選項,以將單元格資料透過過濾器傳遞。如果模擬執行沒有錯誤併產生沒有輸出,則可能的罪魁禍首是在建立管道時未能選中該選項。

vtkKDTreeGenerator 警告

[編輯 | 編輯原始碼]
  • 使用體積渲染的管道將生成有關區域 ID 為 0 的警告。通常可以忽略這些警告。

ParaView 協處理資源

[編輯 | 編輯原始碼]
華夏公益教科書