提交图

200 次代码提交

作者 SHA1 备注 提交日期
Thomas Preud'homme 6cdff7f5a0 Add copyright/license information 2013-04-22 18:34:41 +02:00
Thomas Preud'homme 7bda1c8ced Fix process_stage comment in pipeline_template
Use consistently E_{} notation to denote the computation done at a given
stage instead of a mix of Stage_{} and E_{}.
2013-04-17 18:06:02 +02:00
Thomas Preud'homme 552c06f295 [pipepar] Update Makefile to template changes
Adapt Makefile after the last commit to measure the speedup instead of
trying several values for parameters in order to configure the
pipeline_template benchmark.
2012-09-21 10:54:08 +02:00
Thomas Preud'homme 16117c551a [pipepar] Default to speedup bench for template
Change default mode from stage time measurement to speedup measurement.

Previously, each stage featured a sleep whose length is proportional to
a fixed total time divided by the number of cores involve in the
computation multiplied to the number of packets handled in order to
measure the length in time of one stage. That is, for a number of core
nc, a number of packets np and a total time T, the time spent to sleep
was t=T/(nc*np). By fixing both the number of cores and the total time,
it was thus possible to measure the time needed to deal with one packets
in one stage.

Now, the default mode is to not do any sleep and thus have a computation
whose complexity is inversely proportional to the number of cores. By
varying the number of cores, it is thus possible to measure the speedup.
2012-09-07 16:47:18 +02:00
Thomas Preud'homme eab2df2a38 [pipepar] Compute a CRC 8bit in lattice.c
* Change number of bits in CRC to 8
* Avoid q to be higher than 2^CRC_BIT
* Increase the number of packets when benchmarking lattice.c
2012-09-07 12:03:18 +02:00
Thomas Preud'homme e4450d68c2 [pipepar] Avoid output when using "script"
Call script with -q to avoid any output
2012-09-04 20:02:58 +02:00
Thomas Preud'homme 73f99e761e [pipepar] Fix pragma for 12 cores in pipeline_template
Add state10 to the list of private variable in the last pragma in the
case of 12 cores in pipeline_template computation.
2012-09-04 20:00:29 +02:00
Thomas Preud'homme 4827baead3 [pipepar] Allocate variable statically in lattice
Not sure declarations leads to real allocation anyway (it should be done
according to the pragmas instead) but just in case, move the declaration
of variables in main so that they are statically allocated.
2012-09-04 19:58:26 +02:00
Thomas Preud'homme c32076505e [pipepar] Improve log readability for lattice
Delimit beginning and end of one lattice computation in order to easily
check the logs.
2012-09-04 19:56:44 +02:00
Thomas Preud'homme 7a1801da89 [pipepar] Reduce number of log in lattice.c
Only print a probability for a few possible CRC of the last packet in
order to reduce the number of logs. This allows the program to be run
through ssh and to easily check wether the result is correct or not.
2012-09-04 19:56:16 +02:00
Thomas Preud'homme 01acb467ee [pipepar] Explicit cast in compute_cumulative_metrics_column
Explicit the cast for loop variable from uint_fast32_t to uint_fast16_t
2012-09-04 19:51:07 +02:00
Thomas Preud'homme 3d17a4db90 [pipepar] Improve benchmark run
* Ensure benchmarks run with warm cache
* Run benchmarks 10 times
* Log benchmarks
* Factorize code by using macro
2012-09-03 11:47:26 +02:00
Thomas Preud'homme a578c33577 [commtech] Detect if perf supports -o switch
Use script in the case where perf doesn't support -o switch (old perf
version)
2012-07-07 23:43:30 +02:00
Thomas Preud'homme 619fb7aeba [commtech] Also compile gomp_stream_64_comm
Add gomp_stream_64_comm to the least of communication techniques to
compile.
2012-07-07 23:29:11 +02:00
Thomas Preud'homme 467d0b4122 [commtech] Fixes in gomp_stream
* Stick to the sizes used in gomp_stream
* Release data when they are *all* received
2012-07-07 23:26:24 +02:00
Thomas Preud'homme d8c16a4aa3 Merge branch 'bqv2_buf_end' 2012-07-07 23:14:15 +02:00
Thomas Preud'homme df09d89933 [commtech] Use only 1 thread per core
Creating 2 thread per core in the purpose of receiving while sending is
plain stupid. First it needs 2 threads synchronizing with each other
which has a cost. Second, since only one thread can run at a time the
threads slow each other (using BatchQueue where the sender is on the
same core as the receiver yields bad performance). This patch remove all
this complexity to have one thread receive, compute and then resend
data, which improve performances dramatically.
2012-07-07 23:14:08 +02:00
Thomas Preud'homme 4914b0dcdd Add CSQ (2/1) and CSQ (2/32), Del CSQ (2/2) 2012-03-27 00:31:16 +02:00
Thomas Preud'homme a80decaef4 [commtech] Provide 64 cache lines version of algos
* Provide for BatchQueue, CSQ, FastForward, MCRingBuffer and GOMP stream
  a version using 64 cache lines in total for all buffers.
* Rename common version from _common_comm.h to _common.h to avoid
  considering them as communication technique on their own
2012-03-26 16:44:30 +02:00
Thomas Preud'homme c37c100355 [commtech] Initialize vector in calc_mat.c 2012-03-26 16:14:23 +02:00
Thomas Preud'homme 09afc1ed2b parsing.sh: Remove assumption about calc args
Calc can have several args for useless_loop and line prods and for comm
and barriere bench. Hence:

* Change use_histo to reflect that
* Set list of args per bench/prod instead of globally
* No need for the argument (since there is several) in create_complex_dat_body
2012-03-26 16:13:23 +02:00
Thomas Preud'homme 74f5176116 parsing.sh Remove a few assumptions
Remove assumptions around barriere bench:
* Not always 2 memory hierarchy are tested -> numCacheConfigs
* barriereList -> ${bench}List
* Size of the calc argument -> *
2012-03-26 16:04:31 +02:00
Thomas Preud'homme 40dfd58c86 parsing.sh: support batch_queue_* for barriere
Count batch_queue_* in barriere bench
2012-03-26 13:20:18 +02:00
Thomas Preud'homme 758198c2b0 [commtech] Add missing .c for new CSQ configs 2012-03-20 12:16:10 +01:00
Thomas Preud'homme 7087998fc6 [commtech] Add the new configs for compilation 2012-03-20 12:05:12 +01:00
Thomas Preud'homme 5840b57937 [commtech] Provide more CSQ configs
* Rename CSQ configs to csq_<nbr_buffers>_<size_buffer>_comm.h
* Add several configs
* Default config is csq_comm.h
2012-03-20 11:07:05 +01:00
Thomas Preud'homme b47a17c6da Revert junk from "Fix including perf stat in logs"
This partially reverts commit 65a2ed9357.
It removes all the changes in the configuration variable at the top of
the file which were not supposed to be commited.
2012-03-20 10:38:00 +01:00
Thomas Preud'homme 4cd4df3d1c Remove useless .main.d file 2012-03-19 20:40:24 +01:00
Thomas Preud'homme f9aa3b227a CSQ's article suggest SUB_SLOTS should be 64. 2012-03-19 20:40:13 +01:00
Thomas Preud'homme 65a2ed9357 Fix including perf stat in logs
This commit fix commit b0441d7a1c
2012-03-14 12:46:47 +01:00
Thomas Preud'homme 30e8b2a2c6 Automate test of pipeline_template 2012-02-21 18:56:02 +01:00
Thomas Preud'homme 4fa9811144 Support NB_CORES between 1 and 12 out of the box
Prepare an "omp parallel" pragma for NB_CORES between 2 and 12. This
avoid needing any change in the file for NB_CORES between 1 and 12.
2012-02-21 18:56:02 +01:00
Thomas Preud'homme dc0931cde0 Remove debugging printf 2012-02-21 18:56:02 +01:00
Thomas Preud'homme 75bd067571 Check the result of the computation
Make sure the result of the computation is always the same
2012-02-21 18:56:02 +01:00
Thomas Preud'homme 8010f34abe Stage time can be made smaller
Allow stage time to be smaller by adjusting after the computing was done
instead of before.
2012-02-21 18:56:02 +01:00
Thomas Preud'homme cacde80b30 Allow automatic test run for lattice 2012-02-21 18:56:02 +01:00
Thomas Preud'homme a5f52a6c58 Add the never run lattice.cpp
Add the never run lattice.cpp from upon lattice.c is based.
2012-02-21 18:56:02 +01:00
Thomas Preud'homme 51cbe32eda Update .gitignore 2012-02-21 18:56:02 +01:00
Thomas Preud'homme 50778ca358 Remove fmr_omp-str_base
Stop worrying about keeping bit identical fmr_omp-str_base
2012-02-21 18:56:02 +01:00
Thomas Preud'homme 502ec92654 Update Makefile for fmr_omp-str_base generation 2012-02-21 18:56:02 +01:00
Thomas Preud'homme e07d4d39ab Add template of pipeline parallelism friendly code
pipeline_template.c is an example of a pipeline parallelism friendly code in the
sense that it can't be parallelized by any other known parallelization technique.
2012-02-21 18:56:02 +01:00
Thomas Preud'homme a9793430f9 Add pipeline computation of lattice 2012-02-21 18:56:02 +01:00
Thomas Preud'homme c7eef474b5 Remove addition of $HOME/local/bin to the PATH
Remove addition of $HOME/local/bin to the PATH since it's already in the PATH now
2012-02-21 18:56:02 +01:00
Thomas Preud'homme 23670f3d72 Revert "Add an implementation to compute n'th digit of pi"
This reverts commit f480a5e3c2dd2bc23422c6a1c0acea9b3df428c2.
2012-02-21 18:56:02 +01:00
Thomas Preud'homme da08852ecc Add an implementation to compute n'th digit of pi 2012-02-21 18:56:02 +01:00
Thomas Preud'homme cf816f0685 Add a less naïve script to compare BatchQueue to GOMP native
communication library *and* to sequential code by performing a
more useful computation.
2012-02-21 18:56:02 +01:00
Thomas Preud'homme 03b32a950a Add a simple test to try automatic usage of BatchQueue through OpenMP 2012-02-21 18:56:02 +01:00
Thomas Preud'homme 9e1b9aa1b1 Make the script work with GOMP_stream* and GOMP_batchQ* functions 2012-02-21 18:56:02 +01:00
Thomas Preud'homme 57820691d2 Use CFLAGS in Makefile 2012-02-21 18:56:02 +01:00
Thomas Preud'homme 31f7d7760f Makefile to compile 'n patch FMradio w/ BatchQueue 2012-02-21 18:56:02 +01:00