Pipeliner benchmarking tool

The benchmarking tool uses the ccpem pipeliner to run multiple jobs whilst varying specific parameters so the effects of the parameters on the results and running times can be compared.

Start by generating the template files

Generate a job.star template file for each job to run.

This can be done by taking a job.star file from the job directory of a previous run, or buy using the commandline tool:

CL_pipeline --default_jobstar <job type>

Edit the template files

Set all of the parameters for the jobs to be run by editing the template files.

To test multiple values for a parameter put all the values to try as a comma separated list (with no spaces) enclosed in square brackets

IE: To try multiple values for number of grouped frames in a motion corr find the line:

group_frames            1

and change it to:

group_frames            [1,2,3,4,5,6,7,8]

Make sure to set the running parameters for all the jobs correctly. If all jobs that are able are sent to a queue, then they can be run in parallel and the benchmarking will complete much more quickly.

Warning

If multiple parameters have more than on test value the benchmark tool will make a job for every possible permutation, which could lead to A LOT of jobs being run.

Organise the template files

Once the template files are edited they should all be placed in a single directory called benchmark_templates. Rename them so the benchmark tool knows what order to run them in. EX:

import_job.star motioncorr_job.star ctffind_job.star

should be renamed to:

01_import_job.star 02_motioncorr_job.star 03_ctffind_job.star

Get your data

Place any data that will be needed by the jobs (IE not produced by a previous job) in a directory called benchmark_data.

These files will be symlinked into the running directory of each benchmark job.

These files will be at the top of the job directory so if a job needs one of them as input just the file name should be used as the parameter value.

IE:

'fn_in'              my_particles_data.star

Run the benchmark tool

use the command pipeliner_benchmark_tool

The tool will set up the directory structure and templates and determine if a parallel run can be executed.

Note

Parallel execution will only be allowed if all computationally intensive jobs are being sent to a queue.

The tool will create a directory for each permutation of the test parameters starting with bmrk000

The results of the benchmarking tests are written in two files

benchmark_summary.csv: Contains run times for each job in the benchmarks along with a total run time for each benchmark.

benchmark_full.csv: Contains run times for each individual command in the jobs in the benchmarks.