* Source file for FMradio with (i) openmp stream extension and (ii)
openmp stream and data parallelism extensions.
* Input files (small and larger one) to test FMradio.
* Compiled version of FMradio just in case of any later problem in the
toolchain (although the toolchain itself is saved in git).
Uses 2 mapping to the same structure to avoid prefetching of the
producer semi-buffer by the consumer. The idea is to access everything
through mapping 1 except semi-buffer 2 which is accessed through mapping
2.
Add native algorithm from OpenMP stream extension. This require adding
one function in commtech.h: end_producer(). This function does nothing
for all communication algorithm but gomp_stream (the algorithm added by
this commit).
* Refactor the source to be able to chain more than 2 nodes together
* Compile all binaries by default (binList must be set manually in
lancement.sh to run only a subset of the binaries
Rewrite creation of simple gnuplot to handle more than 2 cache
hierarchies (like L2, CPU and mem for sibling cores on same CPU,
non-sibling cores on same CPU and non sibling cores on different CPU).
* Force english locales (esp. for numeric values)
* Handle french and english numeric values
* Handle absence of useless_prod log
* Handle unique cache hierarchy
- Convert barrier bench from papi+PapiHighLevel to perf framework
- Remove papihighlevel submodule
- Simplify Makefile (include moving some of the code in a separate
script)
Add a calculation method which add the value of the first integer of
n consecutive cache lines and write the results in one of the integer of
these cache lines. Next calculation uses the next n consecutives cache
lines and write the result in the next integer.
* Divide CSQ in 2 communication techniques: one with 2 slots (as in
BatchQueue aka c_cache) and one with 64 slots (as in the article)
* Rename fake communication technique in none communication technique
and disable any activity (send no longer does anything)
Paper about CSQ uses memcpy in enqueue and dequeue. Although it is not
possible to use memcpy in enqueue because of current API, it is possible
to use memcpy in dequeue, hence this commit.
Multiply by 10 the number of cache line send from the producer to the
consumer to have a more accurate mean. This require excluding pipe_comm
as this bench is way too slow to send so much data.