Getting started

The CCPEM-Pipeliner provides easy access to a variety of software tools for all steps of processing cryoEM data from data preprocessing through model building and validation. The workflow is tracked and tools for visualising the full project and analysing the results are provided.

ccpem-pipeliner serves the back end for its companion sofware doppio<https://gitlab.com/ccpem/doppio> which provides a full graphical user interface

Start with a Project

The Project is contained in a single project directory. Paths used by the various steps in the pipeliner are generally relative to this project directory.

To create a project or access an existing project using the API:

from pipeliner.api.manage_project import PipelinerProject

my_project = PipelinerProject()

To start a new project from the command line

$ CL_pipeline --start_new_project

A Project is made up of Jobs

The project is made of up Jobs. Each job is one type of operation on the data, although jobs can have several steps. Jobs are defined by their jobtype. The format of the jobtype is:

<program>.<function>.<keywords>

with:

<program> as the main piece of external software used by the job <function> as the task being performed and an unlimited number of <keywords> that serve to further differentiate the jobtype

To get information about a specific job type from the command line:

$ CL_pipeline --job_info <job type>

Jobs are written in their own job directories with the format:

<function>/job<nnn>/ with the job number automatically incremented as the project progresses.

Note

A job’s directory is also its name, which is used to identify it.

The job’s name EX: AutoPick/job004/ requires the trailing slash at the end.

Jobs are created from parameter files

Jobs can be created by reading from either or two types of Parameter Files: run.job or job.star

Both files define the jobtype, if the job is new or a continuation of an old job, and the parameters or JobOptions.

run.job files are more verbose and easier to manually edit:

job_type == relion.autopick.log
is_continue == false
Pixel size in micrographs (A) == 1.02
Pixel size in references (A) == 3.54
...

job.star files have a more complicated format but have the advantage that the Pipeliner has functions to dynamically edit them:

data_job

_rlnJobType          relion.autopick.log
_rlnJobIsContinue    0

data_joboptions_values
loop_
_rlnJobOptionVariable #1
_rlnJobOptionValue #2
angpix         1.02
angpix_ref     3.54
...

Note

job.star and run.job files can be used interchangeably for almost all applications in a project.

Getting a run.job or job.star file

A parameter file with the default values for any job can be generated with default_runjob() or default_jobstar()

API:

from pipeliner.api.api_utils import default_runjob, default_jobstar
default_runjob("relion.autopick.log")
default_jobstar("relion.autopick.log")

Command line:

$ CL_pipeline --default_runjob <job type>
$ CL_pipeline --default_jobstar <job type>

This will create the files relion_autopick_log_job.star and relion_autopick_log_run.job

Running a job

With the paramter file created the job can now be run with run_job():

API:

my_project.run_job("relion_autopick_log_job.star")

Command line:

$ CL_pipeline --run_job relion_autopick_log_job.star

This will create and run the job AutoPick/job001/

Continuing a job

Some jobs can be continued from where they finished. When a job is run a file continue_job.star is written in its job directory. This file contains only the parameters that are allowed to be modified when the job is continued. Edit this file if any parameters need to be changed and then continue the job with:

API:

my_project.continue_job("AutoPick/job001/")

Command line:

$ CL_pipeline --continue_job AutoPick/job001/

Note

The job’s full name was used to continue it, not the name of the continue_job.star file

Modifying a parameter file

The python API can modify job.star parameter files on-the-fly using edit_jobstar(). This avoids manual editing of the parameter files when stringing together multiple jobs:

from pipeliner.api.api_utils import edit_jobstar

movie_jobstar = my_project.write_default_jobstar("relion.import.movies")
edit_jobstar(movie_jobstar, {"fn_in_raw": "Movies/*.mrcs"})
movie_job_name = my_project.run_job(movie_jobstar)

mocorr_jobstar = my_project.write_default_jobstar("relion.motioncorr.own")
edit_jobstar(mocorr_jobstar, {"fn_in": movie_job_name + "movies.star"})
mocorr_job_name = my_project.run_job(mocorr_jobstar)

Running schedules

Scheduling allows for sets of jobs to be run multiple times via schedule_job() and run_schedule()

Note

When a job is scheduled placeholder files are created for all of its outputs so these files can be used as if they already exist.

Here is running the same jobs as above, except using the scheduling functions to run the set of import and motion correction jobs 10 times:

API:

movie_jobstar = my_project.write_default_jobstar("relion.import.movies")
edit_jobstar(movie_jobstar, {"fn_in_raw": "Movies/*.mrcs"})
movie_job_name = my_project.schedule_job(movie_jobstar)

mocorr_jobstar = my_project.write_default_jobstar("relion.motioncorr.own")
edit_jobstar(mocorr_jobstar, {"fn_in": movie_job_name + "movies.star"})
mocorr_job_name = my_project.schedule_job(mocorr_jobstar)

my_project.run_schedule(
        fn_sched="my_schedule",
        job_ids=[movie_job_name, mocorr_job_name],
        nr_repeat=10,
        )

To accomplish this from the command line the parameter files for the Import and MotionCorr jobs must already have been created

$ CL_pipeline --schedule_job <import job param file>
$ CL_pipeline --schedule_job <motion corr job param file>
$ CL_pipeline --run_schedule --name my_schedule --jobs job001 job002 --nr_repeat 10

Note

The command line tool intelligently parses job names, so for the job named Import/job001/ it would accept job001 or 1 as well as the full job name

Other job tools

A variety of other tool exist for modifying jobs in the project. See the api documentation for how to use these functions: