Standard application benchmark

This part will be dedicate to the benchmarking of surface_rec application, that resolve the following equation for one set of data.

\int_\Omega \nabla z \cdot \nabla v = \int_\Omega(p,q)\cdot\nabla v

The default data we will be using here are as follows :

  • Image size : 1280*960

  • Number of processors : 4

  • Number of case : 4 ( only for serial_control )

All the test are realised with the ML preconditionner, on the Irma Atlas Cluster.

1. Reminder

As a reminder, the surface_rec application allows us to obtain the height of a surface from the derivates following x and y axis. Things we are especially interested in here are how the application time completion react from different number of processors used or differents image sizes.

The program can be decomposed into main parts :

  • Mesh correspond at the time need to build the mesh ( nodes and elements ) and prepare it to use.

auto mesh = boost::make_shared<MeshStructured>( ioption("Scala")*(x.rows()),
                                                Environment::worldComm() );
  • Vh is the part where we built the function space, used for defining elements ( test and trial ).

auto Vh = Pch<1> ( mesh ) ;
  • Elements is the build of the trial and test elements, as well that we will browse trough the mesh, as well as "data" elements, which will help us to define our problem.

auto u = Vh->element () ;
auto v = Vh->element () ;
auto px = Vh->element () ;
auto py = Vh->element () ;
  • HbfToFeel is the time needed to transfer the data from the hbf files into our "data" elements

Hbf2FeelppStruc h2f( nx, ny, Vh );
  • rhs and lhs are respectively the build of the right ( linear ) and left ( bilinear ) side of our equation problem.

auto l = form1( _test=Vh );
l = integrate(_range=elements(mesh),

auto a = form2( _trial=Vh, _test=Vh);
a = integrate( _range=elements(mesh),
  • Finally the solver part is the time need to solve our equation , with possible conditions to take in account.


1.1. Processor numbers scalability

The first one we will observe is the processor numbers interactions with the necessary time for the application to finish. We will choose an image with 1280\times 960 size as a referent for these tests.

Time for different numbers of processors used

Time for different number of processors used

As we can see, we start from approximatly 70 s in sequential to less than 10 s with 16 processors.

1.2. Image size scalability

Time for different sizes of images

Time for different size of given images

As expect, the time rise up as the size of slope data also grow up The mesh creation is particuliary touch by this effect, as well as the right side member build.

Time for different sizes of images

Time for different number of processors, proportionnaly to sizes of given images

For this second part, we test the weak scalabilty of our program, i.e. if the time need is conserved when the slopes data sizes and the number of processors used increase the same way ( in this case \times 4 ). The two first results seems pretty good as the time difference is small between them. However, the time explodes for the last one.