Granularity in Parallel Algorithms
Nathan Hottle
granularity - a relative measure of the ratio of the amount of computation to the amount of communication within a parallel algorithm implementation
What Is Granularity?
Parallel algorithms are defined as such because multiple sections of the algorithm are designed to run simultaneously on more than one processor. These sections compose the parallel part of the algorithm. Within the parallel sections, every
active processor is assigned a specific task. Each may have its own personal task or they all may be given identical tasks to be performed on their personal data. The task may be as simple as incrementing a counter or it may be a subroutine that involv
es many operations. The size of these tasks is expressed as the granularity of the parallelism. To emphasize the relation between granularity and size, an alternate definition is offered. The grain size of a parallel instruction is a meas
ure of how much work each processor does compared to an elementary instruction execution time. It is equal to the number of serial instructions done within a task by one processor.
How Is It Measured?
The granularity in a parallel section of an algorithm is generally classified by 1 of 3 relative values: fine, medium, or coarse. Notice that I refer to a parallel section of an algorithm instead of the algorithm itself whe
n determining granularity. An algorithm may contain many different grain sizes and in fact, even a section of an algorithm may have one grain size nested within another. Granularity is determined by three characteristics of the algorithm and the hardwar
e used to run the algorithm.
- The structure of the problem
As stated in the opening paragraph, the size of a task can vary tremendously. In data parallel programming, a few operations or possibly a single operation are performed on many pieces of data. These operations are performed in parallel ove
r the data set, often with each processing element (PE) communicating with its neighboring PEs. Referring to the definition of granularity, this task would be considered to have a small granularity (fine-grained). On the other hand, if large subroutines
of an algorithm are independent of one another, they can all be executed in parallel fashion. These subroutines require many calculations with little communication and are coarse-grained.
- The size of the problem
Assume an algorithm where 10 numbers are to be incremented. With 10 PEs, the algorithm should require 1 clock cycle. Now assume the problem size is increased and 100 numbers are to be incremented. Each PE now has 10 numbers to increment and so the
size of its task has increased. The larger task size implies a coarser granularity.
- The number of processors available
This argument corresponds directly to the previous one. If the number of processors is reduced while holding the problem size constant, the task size increases and the granularity becomes coarser. (Mapping a large number of data to fewer processors
requires the use of virtual processors.)
Why Is It Important?
A study of granularity is important if one is going to choose the most efficient paradigm of parallel hardware for the algorithm at hand. SIMD machines are the best bet for very fine-grained algorithms. These machines are built for efficient
communication, usually with neighboring PEs. MIMD machines are less effective on fine-grained algorithms because the message passing system characteristic of these machines causes much time to be wasted in communication. These machines p
erform best with larger grained algorithms. Another parallel paradigm is a network of workstations. Very slow communication classifies this paradigm. It is recommended for coarse-grained algorithms only. In fact, it is often more efficient to u
tilize fewer workstations than are available thereby reducing the amount of communication. Being able to recognize the parallelism within an algorithm and analyze its granularity will guide a programmer to the best parallel paradigm for the task at hand.
Lewis, Rewini. Introduction to Parallel Computing. Prentice-Hall, 1992.
Nevison, Christopher. "Numerical Solution of the Wave Equation: An Example of Data Parallel Computing."