Change default mode from stage time measurement to speedup measurement.
Previously, each stage featured a sleep whose length is proportional to
a fixed total time divided by the number of cores involve in the
computation multiplied to the number of packets handled in order to
measure the length in time of one stage. That is, for a number of core
nc, a number of packets np and a total time T, the time spent to sleep
was t=T/(nc*np). By fixing both the number of cores and the total time,
it was thus possible to measure the time needed to deal with one packets
in one stage.
Now, the default mode is to not do any sleep and thus have a computation
whose complexity is inversely proportional to the number of cores. By
varying the number of cores, it is thus possible to measure the speedup.
pipeline_template.c is an example of a pipeline parallelism friendly code in the
sense that it can't be parallelized by any other known parallelization technique.