Today you will explore non-blocking communication and master/worker parallelism with MPI. The assignment requires you to be able to view PPM image files generated on Setonix. One approach uses X11 forwarding to render to image directly from the remote machine without copying it to your local machine. Start by connecting to setonix by typing
ssh -Y your_username@setonix.pawsey.org.au
where the -Y flag instructs ssh to transmit the X11 output back to your display. MacOS users have the option to install the free Xquartz package. Windows users should connect via MobaXterm which has built-in support for X11.
To test that X11 forwarding is working correctly, cd to the directory /software/projects/courses0100 and type the command
./xiv mandelbrot.ppm
A window should appear showing a render of the Mandelbrot set. Use "+" and "-" on the keyboard to zoom, and press the arrow keys to move around the image once zoomed in. Press "q" to quit. To learn more about the functionality of this simple image viewer, type
./xiv --help
Later today we will want to access this program from other directories. To make this easier, setup an alias by typing
alias xiv="/software/projects/courses0100/xiv"
Test that your alias works by copying mandelbrot.ppm to your home directory and running xiv mandelbrot.ppm from there.
If you are unable to get xiv working through X11 (e.g. XQuartz is having some issues on recent versions of MacOS), or if you are finding it too slow, you may find it easier to just scp the image files to your local machine and open them there. If you go down this road, please make sure you are able to open .ppm files on your computer (the default document viewer Preview on MacOS can open .ppm files). Copy the mandelbrot.ppm file to your computer and make sure you are able to open it.
On an unrelated note, recall that information on the MPI functions and subroutines is available via the UNIX man pages. For example, typing
man MPI_Recv
provides information on the usage and errors associated with a blocking receive. Details for both C and Fortran syntax is provided. These pages are available for all MPI routines. Look up a few, and test your memory to see how many functions you can remember from last week.
As was the case last session, note that all source files are also available via git:
git clone https://github.com/liamscarlett/intro-mpi.
In the first MPI laboratory you were provided with a simple piece of code which used the standard blocking send (MPI_Send and MPI_Recv) to pass a single integer from Process 0 to Process 1. Download this code (send.f90 - send.c) and modify the program to perform non-blocking (non-blocking) communication using MPI_Isend/MPI_Irecv/MPI_Wait.
Last session you explored the deadlock condition between two processes. Small messages were transmitted via a buffer, but when both arrays exceeded a critical size a deadlock state arose. Download the array program from the previous laboratory (array.f90 - array.c). To remind yourself of the deadlock state, compile and run it with a 400KB message as follows (note that a real/float occupies four bytes of memory).
srun -n 2 main 100000
Now modify the code to perform non-blocking communication using MPI_Isend/MPI_Irecv/MPI_Waitall. Show that this avoids deadlock by running the code with 400KB messages.
Your next task is to write a two-process program which performs computation
during non-blocking communication. The program should have the following
structure:
|
On Process 0: Use MPI_Isend to send 3 million reals/floats to Process 1. Once Isend is posted, the program should enter a while loop which a) increments a counter (initially set equal to zero) b) uses MPI_Test to see if the Isend has completed. Exit the while loop when Isend is done, and then print out the value of the counter. |
On Process 1: Accept the message from Process 0 using a blocking receive (MPI_Recv). Once the message has arrived, print out the last data value received. |
Compile the program and run several times. You should find that approximately 500,000 iterations are performed during the Isend operation. Now reduce the size of message sent (e.g. consider 105 or 104 values). What happens? What are the implications for overlapping communication and computation?
Download once again the deadlock program from last session (array.f90 - array.c) and replace both MPI_Send calls with MPI_Ssend. Compile and run, and exchange a single number between the processes using the command-line argument
srun -n 2 ./main 1
You should find that the code now always deadlocks, regardless of message size. This is because a synchronous send is being used, which explicitly precludes the use of any kind of buffer.
Now replace one of the MPI_Ssend calls with MPI_Send as used in the original version. Run the program with two command-line arguments in order to control the size of both messages. Under some conditions the code will not deadlock. Explain what you find.
When working with arrays it is important to appreciate how data is layed out in memory. To explore the manipulation of sections of an array, download the following program (memory.f90 - memory.c) and run the program on two cpus. If you are running the C version, use the command-line arguments
srun -n 2 ./main 0 10 0
while Fortran users should invoke
srun -n 2 ./main 1 10 1
You will find that an array on Process 0 is sent to Process 1, with the exact size and location of the message determined by the command-line arguments. Note that both arrays are fixed in length and contain ten integers. The meaning of the three arguments are (in order): starting position for MPI_Send, number of elements for MPI_Send, and receive position for MPI_Recv. Inspect the code to see how this is implemented.
Vary the number of elements sent, by changing the parameter 10 to some lower number, and observe the output. Using a message-size of 5, vary the start and end parameters. What happens if the start location has a value greater than 5? Explain what you observe. What happens if the end location is greater than 5? What unpredictable behaviour might occur by sending a message in this way?
Finally, C programmers should download and compile the program memory2.c. This accepts the same arguments as memory.c, and produces the same output (check this!). The key difference however is that malloc is used to setup the array structures, and thus the syntax for accessing MPI_Send and MPI_Recv is pointer-based. This is a very important subtlety, so make sure you keep track of which arrays are dynamic (i.e. use malloc) and which are static (i.e. defined in the source).
|
Consider the iterative sequence zi+1=(zi)2+κ where z and κ are complex numbers, and z0=κ. When the magnitude of κ is large, this sequence quickly diverges (i.e. the magnitude of z goes to ∞). The Mandelbrot set is defined as the set of values of κ for which this sequence remains bounded. |
It is known that the Mandelbrot set is bounded by the disk |κ|<2, and therefore we only need to iterate until |zi|>2 to determine that a point is not in the Mandelbrot set.
The following serial code (mandelbrot.f90 - mandelbrot.c) discretises the complex region −2≤Re(z)≤2 and −2≤Im(z)≤2, and computes a 2D array containing the number of iterations required to reach the condition |z|>2, with an upper bound of 1000 iterations. The number of iterations is converted into false colour and written to an image file mandelbrot.ppm. Compile the program in the usual way, run the executable by typing ./main, and wait for about 20 seconds. To view the image, type
xiv mandelbrot.ppm
which runs the program seen at the start of the session. Vary the zoom and explore the structure of the Mandelbrot set, paying close attention to the self-similarity (fractal) nature of the image. Note that these calculations should be performed in /scratch/courses0100/username as the PPM files are quite large. The image size is set to 2000x2000 pixels; try increasing this slightly to explore the self-similarity.
In preparation for the assessable task, examine and compile the following code (mandelbrot2.f90 - mandelbrot2.c) which produces the same output file as the previous version, except that now the array x is one-dimensional. This modification helps greatly when constructing a parallel version of the algorithm. Note particularly the use of the integer and modulo arithmetic which converts the scalar loop into an (i,j) pair. Use the UNIX diff command to confirm that the output is exactly the same, being careful to ensure the image size is the same for both codes.
Your assessable tasks involve three different approaches to MPI parallelization of the Mandelbrot problem:
The report is due on Thursday March 26th. It is to be written in a single-column format and should be prepared using a word-processor such as LaTeX, Word or Pages. Please follow these instructions exactly to ensure your assignment is marked:
Having the submissions in a uniform format makes my life easier when it comes to collecting and marking assignments. Any variation to the instructions above complicates things. There is no need to make waffly reports with cover pages, table of contents, lists of figures, etc. Strive for brevity. A summary of the rationale for your grade, and a breakdown of the grades for each task will be sent to your Curtin email address.
The report should be precise, understandable and concise. It should include a combination of written responses, timing data, graphical representations of key concepts (load balancing, communication cost, wait times, etc) and code. For the load balancing questions, be sure to include graphs that resemble the example on the "Timing Visualisation" slide in Lecture 2. Take care to answer all questions and address all requests for comment and comparison. Include all code fragments that are essential to understand the logical flow, but there is no need to repetitively include lengthy/identical I/O routines for all three programs. Do not pad out the report with pointless whitespace or meaningless words and figures. As a rough guide, it should not exceed ten pages in length.
The report should convey your insight into the subtleties of MPI programming and explain the logic behind your solution. It should include a critical assessment of the best strategy and the compromises involved. For Tasks 2 and 3, there are many possibilities. Feel free to suggest more than one solution.
A "template" for your report:In developing your parallel algorithms and MPI progamming skills you might find the following advice useful: