Commit Graph

75 Commits

Author SHA1 Message Date
Thomas Preud'homme 6cdff7f5a0 Add copyright/license information 2013-04-22 18:34:41 +02:00
Thomas Preud'homme 467d0b4122 [commtech] Fixes in gomp_stream
* Stick to the sizes used in gomp_stream
* Release data when they are *all* received
2012-07-07 23:26:24 +02:00
Thomas Preud'homme d8c16a4aa3 Merge branch 'bqv2_buf_end' 2012-07-07 23:14:15 +02:00
Thomas Preud'homme df09d89933 [commtech] Use only 1 thread per core
Creating 2 thread per core in the purpose of receiving while sending is
plain stupid. First it needs 2 threads synchronizing with each other
which has a cost. Second, since only one thread can run at a time the
threads slow each other (using BatchQueue where the sender is on the
same core as the receiver yields bad performance). This patch remove all
this complexity to have one thread receive, compute and then resend
data, which improve performances dramatically.
2012-07-07 23:14:08 +02:00
Thomas Preud'homme 4914b0dcdd Add CSQ (2/1) and CSQ (2/32), Del CSQ (2/2) 2012-03-27 00:31:16 +02:00
Thomas Preud'homme a80decaef4 [commtech] Provide 64 cache lines version of algos
* Provide for BatchQueue, CSQ, FastForward, MCRingBuffer and GOMP stream
  a version using 64 cache lines in total for all buffers.
* Rename common version from _common_comm.h to _common.h to avoid
  considering them as communication technique on their own
2012-03-26 16:44:30 +02:00
Thomas Preud'homme c37c100355 [commtech] Initialize vector in calc_mat.c 2012-03-26 16:14:23 +02:00
Thomas Preud'homme 758198c2b0 [commtech] Add missing .c for new CSQ configs 2012-03-20 12:16:10 +01:00
Thomas Preud'homme 7c4a484989 [commtech] Simplify if's in send()
Test sender_ptr against the end of the current buffer via
channel->sender_ptr_end
2012-01-30 20:14:22 +01:00
Thomas Preud'homme a20c9a8a21 [commtech] BatchQueue v2
Uses 2 mapping to the same structure to avoid prefetching of the
producer semi-buffer by the consumer. The idea is to access everything
through mapping 1 except semi-buffer 2 which is accessed through mapping
2.
2012-01-30 20:09:52 +01:00
Thomas Preud'homme c6786815cd Add native algo from OpenMP stream extension
Add native algorithm from OpenMP stream extension. This require adding
one function in commtech.h: end_producer(). This function does nothing
for all communication algorithm but gomp_stream (the algorithm added by
this commit).
2012-01-30 20:07:11 +01:00
Thomas Preud'homme a30a5bfe06 Make all threads are joined
in join_threads, nb_thread is the id of the last thread, not the number
of threads to join. Hence the for loop must include this id.
2011-06-01 15:35:08 +02:00
Thomas Preud'homme f0c75c7570 SINK thread (not INTERM) notify its termination
Use !!node_param->type & SINK in likely macro to test wether we are a
SINK node or an INTERM node.
2011-06-01 15:25:08 +02:00
Thomas Preud'homme bd7379e73a Propose 2048 and 4096 buffer size for BatchQueue. 2011-05-27 15:42:40 +02:00
Thomas Preud'homme f05cfdcd92 Improve pipeline (cons and prod in //) 2011-05-25 14:33:42 +02:00
Thomas Preud'homme f01db158c2 Use multiples of BUF_SIZE when needed
Number of cache line sent and size of reception buffer must be a
multiple of BUF_SIZE.
2011-05-10 11:14:28 +02:00
Thomas Preud'homme 6fcfd60d2d Fix buffer loop in BatchQueue single data mode
The buffer in single data mode in batchQueue was not circular because a
variable was not renamed
2011-05-10 11:02:00 +02:00
Thomas Preud'homme 70f8f95647 Fix option to choose the number of node
Option is now in the getopt string and accessible with -l switch.
2011-05-10 11:00:59 +02:00
Thomas Preud'homme f430cc17a7 Fix bugs coming from refactoring 2011-05-05 19:54:44 +02:00
Thomas Preud'homme 372c36155a Fix incorrect usage string: --check -> -k 2011-05-05 14:52:41 +02:00
Thomas Preud'homme 756a701466 [commtech] Refactor to chain more than 2 nodes
* Refactor the source to be able to chain more than 2 nodes together
* Compile all binaries by default (binList must be set manually in
  lancement.sh to run only a subset of the binaries
2011-05-05 14:34:09 +02:00
Thomas Preud'homme 5d71bc53f1 [commtech] Varying size of buffer for BatchQueue
Create several variation of BatchQueue, each with a different buffer
size: batch_queue_1024, batch_queue_512, ..., batch_queue_2.
2011-05-05 11:30:00 +02:00
Thomas Preud'homme 9c835d4c46 Add a "sent words == received words" check 2011-05-04 19:35:10 +02:00
Thomas Preud'homme 22c97ab418 [commtech] Make BUF_SIZE definition be per tech
Don't define BUF_SIZE globally anymore, but per communication technique
2011-03-02 13:19:22 +01:00
Thomas Preud'homme 7c515200e7 [commtech] Remove asm_cache from the comm techs 2011-03-02 13:19:22 +01:00
Thomas Preud'homme 90b7a8007b [commtech] Rename c_cache to batch_queue 2011-03-02 13:19:22 +01:00
Thomas Preud'homme 7d7ad0c46a [commtech] Make WORDS_PER_BUF indep of BUF_SIZE.
The number of data sent must be independent of the buffer size chosen
by each algorithm.
2011-01-28 04:56:44 +01:00
Thomas Preud'homme c3aad28ad5 [commtech] Add calculation method
Add a calculation method which add the value of the first integer of
n consecutive cache lines and write the results in one of the integer of
these cache lines. Next calculation uses the next n consecutives cache
lines and write the result in the next integer.
2011-01-25 17:24:53 +01:00
Thomas Preud'homme 975411a824 Split CSQ in 2 communication techniques.
* Divide CSQ in 2 communication techniques: one with 2 slots (as in
  BatchQueue aka c_cache) and one with 64 slots (as in the article)
* Rename fake communication technique in none communication technique
  and disable any activity (send no longer does anything)
2011-01-25 17:24:53 +01:00
Thomas Preud'homme 5eb7fb50c7 [commtech] CSQ use memcpy in dequeue for fairness
Paper about CSQ uses memcpy in enqueue and dequeue. Although it is not
possible to use memcpy in enqueue because of current API, it is possible
to use memcpy in dequeue, hence this commit.
2011-01-19 12:37:14 +01:00
Thomas Preud'homme 35a81bb736 [commtech] Place volatile on the right qualifier. 2011-01-13 14:58:13 +01:00
Thomas Preud'homme 2d879dc3fc [commtech] Fix idx test in c_cache technique.
c_cache watching status value when idx % BUF_SIZE != 0 instead of when
it's equal zero.
2011-01-03 11:35:42 +01:00
Thomas Preud'homme 6c2868e20c [commtech bench] Take the mean over 10 run. 2010-10-13 23:57:58 +02:00
Thomas Preud'homme 006b1b1d94 Simplify and rewrite comm API. 2010-10-01 18:57:46 +02:00
Thomas Preud'homme 5e3a7f6ce0 commtechs: BUGFIX wait threads to be initialized 2009-07-07 16:08:00 +02:00
Thomas Preud'homme c99d8be100 commtechs: BUGFIX deadlock in thread init 2009-07-07 15:56:20 +02:00
Thomas Preud'homme 698341b99e commtech: Update usage help 2009-07-01 03:10:38 +02:00
Thomas Preud'homme 415004fb4b commtech: Update usage help 2009-07-01 02:45:28 +02:00
Thomas Preud'homme 3341546c75 commtech: calc lib take 1 argument on command line 2009-07-01 02:36:11 +02:00
Thomas Preud'homme 7a1610961c commtech: BUGFIX unwanted optimization
Replace prod += 42 by prod += fourty_two where fourty_two is a volatile
variable to avoid replacement of the loop into a prod += 42 * nb_loop
2009-07-01 02:34:50 +02:00
Thomas Preud'homme 243d8810f1 commtech: avoid a double free corruption
Remove srand and rand function call as they generates double free
corruption (???)
2009-07-01 01:49:16 +02:00
Thomas Preud'homme e90348b54c commtech: BUFFIX in freeing pages
Don't try to free the middle of an allocation
2009-07-01 01:48:13 +02:00
Thomas Preud'homme ba13c18af7 commtech: Free pages when jikes barrier ends 2009-07-01 00:45:19 +02:00
Thomas Preud'homme e04818645a commtech: Add a new calculation method
This calculation performs only a loop and avoid cache pollution
2009-06-30 22:37:55 +02:00
Thomas Preud'homme c98db4b4ba commtech: do_calc() return a void **
This respect what we claim to send to the send() function and allow to
reduce the FAKE_NURSERY_START. Thus we are sure gcc won't optimize the
second part of the if in include/jikes_barrier_comm.h
2009-06-30 22:35:11 +02:00
Thomas Preud'homme 7bfc46db78 commtech: Delete pages free
Pages cannots be freed as fast as they are allocated, so this whole
mecanism can only delay the kernel panic. It's wiser to exit badly if
too much memory is consumed
2009-06-30 22:32:59 +02:00
Thomas Preud'homme 6b9777cb9b Align shared_mem and initial jikes buffer 2009-06-25 14:01:18 +02:00
Thomas Preud'homme c9323cd901 BUGFIX: Fix a possible deadlock if an error occurs 2009-06-25 13:47:50 +02:00
Thomas Preud'homme 177e548efe commtechs benchs: fake_comm perform the writes 2009-06-24 23:35:58 +02:00
Thomas Preud'homme 2fe89da8a2 free memory after 100 Mo allocated 2009-06-24 23:26:51 +02:00