Getting started
The CCPEM-Pipeliner provides easy access to a variety of software tools for all steps of processing cryoEM data from data preprocessing through model building and validation. The workflow is tracked and tools for visualising the full project and analysing the results are provided.
ccpem-pipeliner serves the back end for its companion sofware doppio<https://gitlab.com/ccpem/doppio> which provides a full graphical user interface
Start with a Project
The Project is contained in a single project directory. Paths used by the various steps in the pipeliner are generally relative to this project directory.
To create a project or access an existing project using the API:
from pipeliner.api.manage_project import PipelinerProject
my_project = PipelinerProject()
To start a new project from the command line
$ CL_pipeline --start_new_project
A Project is made up of Jobs
The project is made of up Jobs. Each job is one type of operation on the data, although jobs can have several steps. Jobs are defined by their jobtype. The format of the jobtype is:
<program>.<function>.<keywords>
with:
<program> as the main piece of external software used by the job <function> as the task being performed and an unlimited number of <keywords> that serve to further differentiate the jobtype
To get information about a specific job type from the command line:
$ CL_pipeline --job_info <job type>
Jobs are written in their own job directories with the format:
<function>/job<nnn>/ with the job number automatically incremented as the project progresses.
Note
A job’s directory is also its name, which is used to identify it.
The job’s name EX: AutoPick/job004/ requires the trailing slash at the end.
Jobs are created from parameter files
Jobs can be created by reading from either or two types of Parameter Files: run.job or job.star
Both files define the jobtype, if the job is new or a continuation of an old job, and the parameters or JobOptions.
run.job files are more verbose and easier to manually edit:
job_type == relion.autopick.log
is_continue == false
Pixel size in micrographs (A) == 1.02
Pixel size in references (A) == 3.54
...
job.star files have a more complicated format but have the advantage that the Pipeliner has functions to dynamically edit them:
data_job
_rlnJobType relion.autopick.log
_rlnJobIsContinue 0
data_joboptions_values
loop_
_rlnJobOptionVariable #1
_rlnJobOptionValue #2
angpix 1.02
angpix_ref 3.54
...
Note
job.star and run.job files can be used interchangeably for almost all applications in a project.
Getting a run.job or job.star file
A parameter file with the default values for any job can be generated
with default_runjob()
or
default_jobstar()
API:
from pipeliner.api.api_utils import default_runjob, default_jobstar
default_runjob("relion.autopick.log")
default_jobstar("relion.autopick.log")
Command line:
$ CL_pipeline --default_runjob <job type>
$ CL_pipeline --default_jobstar <job type>
This will create the files relion_autopick_log_job.star and relion_autopick_log_run.job
Running a job
With the paramter file created the job can now be run with
run_job()
:
API:
my_project.run_job("relion_autopick_log_job.star")
Command line:
$ CL_pipeline --run_job relion_autopick_log_job.star
This will create and run the job AutoPick/job001/
Continuing a job
Some jobs can be continued from where they finished. When a job is run a file continue_job.star is written in its job directory. This file contains only the parameters that are allowed to be modified when the job is continued. Edit this file if any parameters need to be changed and then continue the job with:
API:
my_project.continue_job("AutoPick/job001/")
Command line:
$ CL_pipeline --continue_job AutoPick/job001/
Note
The job’s full name was used to continue it, not the name of the continue_job.star file
Modifying a parameter file
The python API can modify job.star parameter files on-the-fly using
edit_jobstar()
. This avoids manual editing
of the parameter files when stringing together multiple jobs:
from pipeliner.api.api_utils import edit_jobstar
movie_jobstar = my_project.write_default_jobstar("relion.import.movies")
edit_jobstar(movie_jobstar, {"fn_in_raw": "Movies/*.mrcs"})
movie_job_name = my_project.run_job(movie_jobstar)
mocorr_jobstar = my_project.write_default_jobstar("relion.motioncorr.own")
edit_jobstar(mocorr_jobstar, {"fn_in": movie_job_name + "movies.star"})
mocorr_job_name = my_project.run_job(mocorr_jobstar)
Running schedules
Scheduling allows for sets of jobs to be run multiple times via
schedule_job()
and
run_schedule()
Note
When a job is scheduled placeholder files are created for all of its outputs so these files can be used as if they already exist.
Here is running the same jobs as above, except using the scheduling functions to run the set of import and motion correction jobs 10 times:
API:
movie_jobstar = my_project.write_default_jobstar("relion.import.movies")
edit_jobstar(movie_jobstar, {"fn_in_raw": "Movies/*.mrcs"})
movie_job_name = my_project.schedule_job(movie_jobstar)
mocorr_jobstar = my_project.write_default_jobstar("relion.motioncorr.own")
edit_jobstar(mocorr_jobstar, {"fn_in": movie_job_name + "movies.star"})
mocorr_job_name = my_project.schedule_job(mocorr_jobstar)
my_project.run_schedule(
fn_sched="my_schedule",
job_ids=[movie_job_name, mocorr_job_name],
nr_repeat=10,
)
To accomplish this from the command line the parameter files for the Import and MotionCorr jobs must already have been created
$ CL_pipeline --schedule_job <import job param file>
$ CL_pipeline --schedule_job <motion corr job param file>
$ CL_pipeline --run_schedule --name my_schedule --jobs job001 job002 --nr_repeat 10
Note
The command line tool intelligently parses job names, so for the job named Import/job001/ it would accept job001 or 1 as well as the full job name
Other job tools
A variety of other tool exist for modifying jobs in the project. See the api documentation for how to use these functions:
set_alias
- Give a job an more descriptive name
run_cleanup
- Move intermediate files from jobs into the trash to save disk space
delete_job
- Move a job to the trash
undelete_job
- Remove a job from the trash and restore it to the project
empty_trash
- Permanently delete files in the trash
draw_flowcharts
- Draw flowcharts for visualising the workflow of a project
get_job_metadata
- Get metadata about a specific job
get_network_metadata
- Get metadata about an entire project
create_archive
- Make archives to for storing and reproducing projects