Code Analysis ============= Complexity ********** We consider the following matrix-matrix product example .. code-block:: python n = int() n = 2 x = zeros(shape=(n,n), dtype=float) y = zeros(shape=(n,n), dtype=float) z = zeros(shape=(n,n), dtype=float) for i in range(0, n): for j in range(0, n): for k in range(0, n): z[i,j] = z[i,j] + x[i,k]*y[k,j] now let's run the following command:: $ pyccel --filename=tests/complexity/inputs/ex3.py --analysis arithmetic cost ~ n**3*(ADD + MUL) memory cost ~ WRITE + n**3*(3*READ + WRITE) computational intensity ~ (ADD + MUL)/(3*READ + WRITE) Let's now consider the following block version of the matrix-matrix product .. code-block:: python n = int() b = int() m = int() n = 10 b = 2 p = n / b x = zeros(shape=(n,n), dtype=float) y = zeros(shape=(n,n), dtype=float) z = zeros(shape=(n,n), dtype=float) r = zeros(shape=(b,b), dtype=float) u = zeros(shape=(b,b), dtype=float) v = zeros(shape=(b,b), dtype=float) for i in range(0, p): for j in range(0, p): for k1 in range(0, b): for k2 in range(0, b): r[k1,k2] = z[i+k1,j+k2] for k in range(0, p): for k1 in range(0, b): for k2 in range(0, b): u[k1,k2] = x[i+k1,k+k2] v[k1,k2] = y[k+k1,j+k2] for ii in range(0, b): for jj in range(0, b): for kk in range(0, b): r[ii,jj] = r[ii,jj] + u[ii,kk]*v[kk,jj] for k1 in range(0, b): for k2 in range(0, b): z[i+k1,j+k2] = r[k1,k2] the analysis is done again using:: $ pyccel --filename=tests/complexity/inputs/ex4.py --analysis arithmetic cost ~ DIV + b**3*p**3*(ADD + MUL) memory cost ~ 2*READ + 3*WRITE + b**2*p**2*(2*READ + 2*WRITE + p*(2*READ + 2*WRITE + b*(3*READ + WRITE))) computational intensity ~ (ADD + MUL)/(3*READ + WRITE) Now, let us assume we have two level of memories, the **fast** memory represents the **L2** cache. By giving the variables that live in the cache, using **local_vars**, the analysis gives:: $ pyccel --filename=tests/complexity/inputs/ex4.py --analysis --local_vars="u,v,r" arithmetic cost ~ DIV + b**3*p**3*(ADD + MUL) memory cost ~ 2*READ + 3*WRITE + b**2*p**2*(2*READ*p + READ + WRITE) computational intensity ~ b*(ADD + MUL)/(2*READ) As we can see, the computational intensity is now a linear function of the block size :math:`b`. Therefor, this algorithm will take more advantage of the spatial locality of data. .. todo:: remove local_vars from **pyccel** command line and use Annotated Comments instead. .. todo:: for the moment, we only cover the **for** statement. Further work must be done for **if** and **while** statements. .. todo:: add probability law for the **if** statement. .. todo:: how to handle the **while** statement? Arithmetic ^^^^^^^^^^ .. TODO:: add Fusion Mul-Add (FMA) instruction .. TODO:: add table of costs for all instructions Memory ^^^^^^ We describe here our *Memory model*. It follows the work of J. Demmel and his collaborators on the matrix multiplication. More details can be found in `J. Demmel's talk`_ .. _J. Demmel's talk: https://people.eecs.berkeley.edu/~demmel/cs267_Spr99/Lectures/Lect_02_1999b.pdf Here are our assumptions: 1. Two levels of memory: *fast* and *slow* 2. All data are initially in *slow* memory