Adapt Makefile after the last commit to measure the speedup instead of
trying several values for parameters in order to configure the
pipeline_template benchmark.
Change default mode from stage time measurement to speedup measurement.
Previously, each stage featured a sleep whose length is proportional to
a fixed total time divided by the number of cores involve in the
computation multiplied to the number of packets handled in order to
measure the length in time of one stage. That is, for a number of core
nc, a number of packets np and a total time T, the time spent to sleep
was t=T/(nc*np). By fixing both the number of cores and the total time,
it was thus possible to measure the time needed to deal with one packets
in one stage.
Now, the default mode is to not do any sleep and thus have a computation
whose complexity is inversely proportional to the number of cores. By
varying the number of cores, it is thus possible to measure the speedup.
Not sure declarations leads to real allocation anyway (it should be done
according to the pragmas instead) but just in case, move the declaration
of variables in main so that they are statically allocated.
Only print a probability for a few possible CRC of the last packet in
order to reduce the number of logs. This allows the program to be run
through ssh and to easily check wether the result is correct or not.
Creating 2 thread per core in the purpose of receiving while sending is
plain stupid. First it needs 2 threads synchronizing with each other
which has a cost. Second, since only one thread can run at a time the
threads slow each other (using BatchQueue where the sender is on the
same core as the receiver yields bad performance). This patch remove all
this complexity to have one thread receive, compute and then resend
data, which improve performances dramatically.
* Provide for BatchQueue, CSQ, FastForward, MCRingBuffer and GOMP stream
a version using 64 cache lines in total for all buffers.
* Rename common version from _common_comm.h to _common.h to avoid
considering them as communication technique on their own
Calc can have several args for useless_loop and line prods and for comm
and barriere bench. Hence:
* Change use_histo to reflect that
* Set list of args per bench/prod instead of globally
* No need for the argument (since there is several) in create_complex_dat_body
This partially reverts commit 65a2ed9357.
It removes all the changes in the configuration variable at the top of
the file which were not supposed to be commited.
pipeline_template.c is an example of a pipeline parallelism friendly code in the
sense that it can't be parallelized by any other known parallelization technique.