Code Analysis

Complexity

We consider the following matrix-matrix product example

n = int()
n = 2

x = zeros(shape=(n,n), dtype=float)
y = zeros(shape=(n,n), dtype=float)
z = zeros(shape=(n,n), dtype=float)

for i in range(0, n):
    for j in range(0, n):
        for k in range(0, n):
            z[i,j] = z[i,j] + x[i,k]*y[k,j]

now let’s run the following command:

$ pyccel --filename=tests/complexity/inputs/ex3.py --analysis
arithmetic cost         ~ n**3*(ADD + MUL)
memory cost             ~ WRITE + n**3*(3*READ + WRITE)
computational intensity ~ (ADD + MUL)/(3*READ + WRITE)

Let’s now consider the following block version of the matrix-matrix product

n = int()
b = int()
m = int()
n = 10
b = 2
p = n / b

x = zeros(shape=(n,n), dtype=float)
y = zeros(shape=(n,n), dtype=float)
z = zeros(shape=(n,n), dtype=float)

r = zeros(shape=(b,b), dtype=float)
u = zeros(shape=(b,b), dtype=float)
v = zeros(shape=(b,b), dtype=float)

for i in range(0, p):
    for j in range(0, p):
        for k1 in range(0, b):
            for k2 in range(0, b):
                r[k1,k2] = z[i+k1,j+k2]
        for k in range(0, p):
            for k1 in range(0, b):
                for k2 in range(0, b):
                    u[k1,k2] = x[i+k1,k+k2]
                    v[k1,k2] = y[k+k1,j+k2]
            for ii in range(0, b):
                for jj in range(0, b):
                    for kk in range(0, b):
                        r[ii,jj] = r[ii,jj] + u[ii,kk]*v[kk,jj]
        for k1 in range(0, b):
            for k2 in range(0, b):
                z[i+k1,j+k2] = r[k1,k2]

the analysis is done again using:

$ pyccel --filename=tests/complexity/inputs/ex4.py --analysis
arithmetic cost         ~ DIV + b**3*p**3*(ADD + MUL)
memory cost             ~ 2*READ + 3*WRITE + b**2*p**2*(2*READ + 2*WRITE + p*(2*READ + 2*WRITE + b*(3*READ + WRITE)))
computational intensity ~ (ADD + MUL)/(3*READ + WRITE)

Now, let us assume we have two level of memories, the fast memory represents the L2 cache. By giving the variables that live in the cache, using local_vars, the analysis gives:

$ pyccel --filename=tests/complexity/inputs/ex4.py --analysis --local_vars="u,v,r"
arithmetic cost         ~ DIV + b**3*p**3*(ADD + MUL)
memory cost             ~ 2*READ + 3*WRITE + b**2*p**2*(2*READ*p + READ + WRITE)
computational intensity ~ b*(ADD + MUL)/(2*READ)

As we can see, the computational intensity is now a linear function of the block size b. Therefor, this algorithm will take more advantage of the spatial locality of data.

Todo

remove local_vars from pyccel command line and use Annotated Comments instead.

Todo

for the moment, we only cover the for statement. Further work must be done for if and while statements.

Todo

add probability law for the if statement.

Todo

how to handle the while statement?

Arithmetic

Todo

add Fusion Mul-Add (FMA) instruction

Todo

add table of costs for all instructions

Memory

We describe here our Memory model. It follows the work of J. Demmel and his collaborators on the matrix multiplication. More details can be found in J. Demmel’s talk

Here are our assumptions:

  1. Two levels of memory: fast and slow
  2. All data are initially in slow memory