Finite Element Assembly in Nek5000/RS

In this post, I’d like to talk a bit about how Nek treats the connectivity and therefore periodicity. Despise that we are going into Nek’s structure, the general concept can be applied to other finite elements methods. This is one of the main communication we have in the solver handled by gslib ¹ which we will talk about it in the future posts.

1. Formulation

Let’s start with a 1D domain with two elements at polynomial degree $N>0$. The high-order part doesn’t really matter, here we just use that to give each elements $N+1$ points. See the figure below.

The $N$+1 grid points in the left element ($e$=1) are named $x_0^1, x_1^1, \cdots, x_N^1$. As for the right element, we have points $x_0^2, x_1^2, \cdots, x_N^2$. Since two elements are connected, they share a common interface point $x^1_N = x^2_0$. Therefore, there are only $2N+1$ unique points and we order those unique coordinates with global indices from $x_0$ to $x_{2N}$. On the same coordinates, $x_N = x^1_N = x^2_0$, if there is a continuous function $u(x)$ across the elements we will have $u(x_N) = u(x^1_N) = u(x^2_0)$.

Up to this stage, we have two numbering systems.

Name	Notations	#Points	#Unique coordinates
Local index (L)	$x_i^e$, $\ \ e=1$ or $2$, $\ \ i=0,1,\cdots,N$	2N+2	2N+1
Global index (G)	$x_j$, $\ \ j=0,1,\cdots,2N$	2N+1	2N+1

Based on the collocation points and spatial discretization, one can compute the numerical integration, derivatives or whatever algebraic equations based on the local points within an element. To simplify the problem, let’s say we have discretized linear system for each element.
\begin{align} A^e \underline{u^e} = \underline{f^e}, \ \ e=1, 2 \end{align}

1.1 Numerical Integration Viewpoint

We go back to the weak formulation that seeks the solution $u$ in weak space satisfying the integral norm for arbitrary test function $v$.

\[A[u](v) := \int_\Omega v A(u) = \int_\Omega v f(u) =: F[u](v)\]

With two elements $\Omega=\Omega^1\cup\Omega^2$, the integral is simply the sum of each local integrals

\[A[u](v) := \int_{\Omega^1} v A(u) + \int_{\Omega^2} v A(u)\]

From the view of the global grid points, the point $x_N$ will naturally have contribution in both integrals in each element.

1.2 Collocation Viewpoint

Two elements will have $2N+2$ equations but the continuity requirement only yields $2N+1$ degree of freedom. Therefore, it’s natural to merge two equation on the common point $x_N$. By adding two equations together, it will achieve the continuity due to the equal contribution form both elements.

1.3 Gather-Scatter Operations

Either way, the combined linear system becomes the following system on global grid.

\[\begin{pmatrix} a^1_{0,0} & a^1_{0,1} & \cdots & a^1_{0,N} & & & & \\ a^1_{1,0} & a^1_{1,1} & \cdots & a^1_{1,N} & & & & \\ \vdots & \vdots & \ddots & \vdots & & & &\\ a^1_{N,0} & a^1_{N,1} & \cdots & a^1_{N,N} + a^2_{0,0} & a^2_{0,1} & \cdots & a^2_{0,N}\\ & & & a^2_{1,0} & a^2_{1,1} & \cdots & a^2_{1,N}\\ &&& \vdots & \vdots & \ddots & \vdots \\ & & & a^2_{N,0} & a^2_{N,1} & \cdots & a^2_{N,N}\\ \end{pmatrix} \begin{pmatrix} u(x_0) \\ u(x_1) \\ \vdots \\ u(x_N) \\ u(x_{N+1}) \\ \vdots \\ u_(x_{2N}) \end{pmatrix} = \begin{pmatrix} f(x_0) \\ f(x_1) \\ \vdots \\ 2f(x_N) \\ f(x_{N+1}) \\ \vdots \\ f_(x_{2N}) \end{pmatrix}\]

In practical, such global system should never be constructed. The finite elements arrays are naturally stored in distributed memory for parallel computing so the element-wise data are stored in local arrays.

To transfer between coordinates, we first form the “scatter” operator that copies data from global grid to the corresponding local grid, which can be written in to $Q$ matrix

\[\underline{u}_L = Q \underline{u}_G = \begin{pmatrix} 1 & & \\ & \ddots &\\ && 1 & \\ & & 1 \\ &&& \ddots \\ &&& & 1 \end{pmatrix} \underline{u}_G\]

As for the gather operator, it can be treated as the scatter operation from the test function, which is defined as $Q^T$

Solving the global linear system $A_G \underline{u}_G = \underline{f}_G$ is same as solving the local linear system $QQ^T A_L \underline{u}_L = QQ^T\underline{f}_L$ up to the rank = degree of freedom. Despite the values are summed at the interface points, the right-hand-side is also doubled so it still give the same answer after the solve.

One extra detail is, when solving the local system with iterative methods, one has to use either the integral norm or a re-scaled algebraic norm weighted with the inverse of the multiplicity. In other words, the algebraic norm follows the norm in global system.

2. Connectivity

The connectivity is defined as the global index of each points. To construct the $QQ^T$ operator, all it needs is an local array that stores the global index. We assume that both local and global index are continuous and the global one is unique. That is, if it’s 1-base (for MATLAB), the global index starts from $1$ to nUV where nUV is the number of unique points.

  
% Assume both index arrays are continuous
for i=1:length(local_index)
   j = global_index(i);
   Q(i,j) =  Q(i,j) + 1.0;
end

% Or
nV = length(local_index);
nUV = max(global_index);
Q = sparse(local_index,global_index,1.0,nV,nUV);

3. Nek5000’s connectivity

In my opinion, whoever builds the mesh needs to make sure the mesh is water-tight in the sense that there is no leaking holes between elements, or $\Omega = \cup_{e=1}^E \Omega^{(e)}$ where arbitrary two elements has no volume in $d$-th dimension measure $|\Omega^{(i)}\cap\Omega^{(j)}|=0$. This means, one needs to keep track on the connectivity during the meshing. However, meshing is already hard, it will be more flexible if we don’t have to worry about connectivity at each stage. Nek has tools to rebuild the connectivity and user can alter the connectivity in the .usr file as well.

  
# Valid Nek5000 inputs 
.rea + .map      
.re2 + .ma2      
.re2 + .co2 + PPLIST+=PARRSB
.re2 + PPLIST+=PARRSB     

3.1. `.map`/`.ma2` file and `genmap`

The genmap is a robust tool that does two things, connectivity and partitioning. We will talk about partitioning in future post.

The algorithm first sorts the coordinates based on their physical coordinates. With an user-provided relative mesh tolerance (default = 0.2), if the distance of two vertices smaller than the criterion, then they share the same node index.

The tol here is an #\epsilon$-ball that can cover the leaking holes in mesh. Therefore, a larger tol is more forgiving to the bad mesh. However, if tol is too large, it can fuse too many points together, which collapses and degenerates the hex elements and causing zero Jacobian value.

The smaller tol can usually get a valid connectivity for high aspect ratio mesh. The “valid” here means runnable in Nek, which doesn’t mean it’s correct. For example, at the limit $tol \to 0$, all elements are treated as isolated subdomains that is not connected to others.

The “isolated” elements is harder to detect but it usually generates an empty boundary condition. The recommendation workflow for genmap is

Use default tol (0.2) by hitting the return key
If it fails, tighten the tol
Repeat 2 until it works, but user must be very careful if tol is, says, less than $10^{-3}$

Keep in mind that, precision matters. If a mesh has high aspect ratio but only stored in ascii .rea format, it might not have enough of precision to use tol-based connectivity tools.

3.2. `.con`/`.co2` file and `gencon`

The gencon is a subset of genmap that only does the connectivity, so the .co2 file contains only the connectivity. Currently, the .con file is not usable in Nek. We only accept .co2 file.

  
#  symmetric vertex ordering / lexicalgraphical ordering
#  in r-s-t coordinates
#
#      (vec 15)
#      t       ^ s (vec 13)
#      :      .
#      :  7 === 8
#      : /:    /| 
#      :/ :   / | 
#      5 === 6  | 
#      |  3 -|- 4
#      | .   | /
#      |.    |/
#      1 === 2 --> r (vec 12)

3.3. `n2to3co2` tool

n2to3co2 is a new tool that can extrude a 2D .con/.co2 files to a 3D .co2. It follows the same element ordering in the mesh by n2to3. This tool allow us to generate connectivity for a huge mesh (E = 200M ~ 2Bn) for testing and debugging purpose.

3.4. parCon (shipped in parRSB)

Both genmap and gencon are serial tool. As the problem size gets larger, it can take hours to generate a .co2. Plus, it will need a large RAM server to run it. Personally, if the case is larger than 1M or 10M elements, I’d suggest to use parCon and parRSB.

3.5. `setvert` and `usersetvert`

Once the code starts, it reads the global indices of all vertices of the elements and stores it into Nek variable, glo_num. The subroutine setvert in Nek5000 will generate the global indices for each grid points vertex. The indices are numbered in the order of vertices, edges, face, and interior points. At the end of setvert, it calls usersetvert which allows user to modify or overwrite the index. This allows user to do whatever they want to modify the connectivity.

3.6. `gs` functions

Nek5000 will setup the gather scatter operations based on the connectivity. The $QQ^T$ operations can be done by call dssum(u,lx1,ly1,lz1). Instead of sum, one can also do other reduction operations.

dsop(u,'sum',nx,ny,nz)
dsop(u,'mul',nx,ny,nz)
dsop(u,'min',nx,ny,nz)
dsop(u,'max',nx,ny,nz)

One can also look into fgslib_gs_op (gs_op in gslib + prefix for Fortran interface) to see the API in details including data type and many operations.

4. Periodicity

How to do enforce periodicity is one of the most asked questions in Nek5000’s forum.

In the code level, it’s literately zero effort. The periodicity is treated the same as the internal "E " boundary as long as the connectivity is set properly. It’s super easy to achieve (double/triple) periodic, or even complicate topology such as a Möbius strip. Neat!

4.1 `genbox` and `.rea`/`.re2`

The genbox will set the boundary condition "P " in .rea/.re2 and specify the neighbor’s element id and face id into bc(1,f,e,ifield) and bc(2,f,e,ifield). This, however, is a temporary information for the tools like genmap. At runtime, this “face-to-face” connectivity is never used.

One can use this face-to-face connectivity, in my understanding, this is how we deal with the vector-translated periodic, p .

Note: The periodic BC set in the .box file only works when there is a single box. For multi-boxes, one has to fix the periodicity with prenek’s auto-periodic.

4.2 `genmap` (or `gencon`)

genmap will read the “face-to-face” connectivity stored in .rea or .re2.
One can directly modified the generated the .co2 file to have desired connectivity.

Note: genmap require 3 layers of elements to identify the periodic with correct orientation. Think about Möbius strip, face-to-face connectivity can still flip the face orientation in 2D.

4.3 `usrsetvert`

This is called at runtime, which is the last chance to modify the connectivity. In the subroutine usrsetvert (.usr) is called after setup. This allows us get rid of “3 layers of elements” condition. See NekRS example channel

fix_geom will make sure the grid points sharing the same global vertex are at the same location. However, the points that has P will be skipped.

gslib: Gather Scatter Library GitHub ↩

Finite Element Assembly in Nek5000/RS

1. Formulation

1.1 Numerical Integration Viewpoint

1.2 Collocation Viewpoint

1.3 Gather-Scatter Operations

2. Connectivity

3. Nek5000’s connectivity

3.1. .map/.ma2 file and genmap

3.2. .con/.co2 file and gencon

3.3. n2to3co2 tool

3.4. parCon (shipped in parRSB)

3.5. setvert and usersetvert

3.6. gs functions

4. Periodicity

4.1 genbox and .rea/.re2

4.2 genmap (or gencon)

4.3 usrsetvert

Further Reading

Nek5000 簡介

Gauss-Lobatto-Legendre Points

Website Logs