Pipeliner benchmarking tool
The benchmarking tool uses the ccpem pipeliner to run multiple jobs whilst varying specific parameters so the effects of the parameters on the results and running times can be compared.
Start by generating the template files
Generate a job.star template file for each job to run.
This can be done by taking a job.star file from the job directory of a previous run, or buy using the commandline tool:
CL_pipeline --default_jobstar <job type>
Edit the template files
Set all of the parameters for the jobs to be run by editing the template files.
To test multiple values for a parameter put all the values to try as a comma separated list (with no spaces) enclosed in square brackets
IE: To try multiple values for number of grouped frames in a motion corr find the line:
group_frames 1
and change it to:
group_frames [1,2,3,4,5,6,7,8]
Make sure to set the running parameters for all the jobs correctly. If all jobs that are able are sent to a queue, then they can be run in parallel and the benchmarking will complete much more quickly.
Warning
If multiple parameters have more than on test value the benchmark tool will make a job for every possible permutation, which could lead to A LOT of jobs being run.
Organise the template files
Once the template files are edited they should all be placed in a single
directory called benchmark_templates
. Rename them so the benchmark tool
knows what order to run them in.
EX:
import_job.star
motioncorr_job.star
ctffind_job.star
should be renamed to:
01_import_job.star
02_motioncorr_job.star
03_ctffind_job.star
Get your data
Place any data that will be needed by the jobs (IE not produced by a previous job) in a directory called benchmark_data.
These files will be symlinked into the running directory of each benchmark job.
These files will be at the top of the job directory so if a job needs one of them as input just the file name should be used as the parameter value.
IE:
'fn_in' my_particles_data.star
Run the benchmark tool
use the command pipeliner_benchmark_tool
The tool will set up the directory structure and templates and determine if a parallel run can be executed.
Note
Parallel execution will only be allowed if all computationally intensive jobs are being sent to a queue.
The tool will create a directory for each permutation of the test parameters
starting with bmrk000
The results of the benchmarking tests are written in two files
benchmark_summary.csv
: Contains run times for each job in the benchmarks along with a total run time for each benchmark.
benchmark_full.csv
: Contains run times for each individual command in the jobs in the benchmarks.