Uses 2 mapping to the same structure to avoid prefetching of the
producer semi-buffer by the consumer. The idea is to access everything
through mapping 1 except semi-buffer 2 which is accessed through mapping
2.
Add native algorithm from OpenMP stream extension. This require adding
one function in commtech.h: end_producer(). This function does nothing
for all communication algorithm but gomp_stream (the algorithm added by
this commit).
* Refactor the source to be able to chain more than 2 nodes together
* Compile all binaries by default (binList must be set manually in
lancement.sh to run only a subset of the binaries
Add a calculation method which add the value of the first integer of
n consecutive cache lines and write the results in one of the integer of
these cache lines. Next calculation uses the next n consecutives cache
lines and write the result in the next integer.
* Divide CSQ in 2 communication techniques: one with 2 slots (as in
BatchQueue aka c_cache) and one with 64 slots (as in the article)
* Rename fake communication technique in none communication technique
and disable any activity (send no longer does anything)
Multiply by 10 the number of cache line send from the producer to the
consumer to have a more accurate mean. This require excluding pipe_comm
as this bench is way too slow to send so much data.
* Add a matrice calculation as one of the possible calculation
* Modify the makefile to permit calculation lib compilation
* Reorganize the makefile to be able to execute the default target
* Introduce 3 new variables:
- LOCALDIR: base directory for looking after papihighlevel
- PAPIHIGHLEVELINCDIR: include directory in $(LOCALDIR)
- PAPIHIGHLEVELLIBDIR: library directory in $(LOCALDIR)