Allow stage time to be smaller by adjusting after the computing was done instead of before.
pipeline_template.c is an example of a pipeline parallelism friendly code in the sense that it can't be parallelized by any other known parallelization technique.