=============== Getting started =============== The CCPEM-Pipeliner provides easy access to a variety of software tools for all steps of processing cryoEM data from data preprocessing through model building and validation. The workflow is tracked and tools for visualising the full project and analysing the results are provided. The ccpem-pipeliner serves the back end for its companion software `Doppio `_ which provides a full graphical user interface. Start with a *Project* ---------------------- The **Project** is contained in a single project directory. .. note:: Paths used in the pipeliner, such as paths to files in job parameters are generally relative to this project directory. To create a project or access an existing project using the API: .. code-block:: python from pipeliner.api.manage_project import PipelinerProject my_project = PipelinerProject() To start a new project from the command line .. code-block:: console $ pipeliner --new_project A *Project* is made up of *Jobs* -------------------------------- The project is made of up **Jobs**. Each job is one type of operation on the data, although jobs can have several steps. Jobs are defined by their **jobtype**. The format of the jobtype is: :: .. with: | ``: the main piece of external software used by the job | ``: the task being performed | <`keywords`>: serve to further differentiate the jobtype; an unlimited number are allowed To get information about a specific job type from the command line: .. code-block:: console $ pipeliner --job_info Jobs are written in their own job directories with the format: :: /job/ with the job number automatically incremented as the project progresses. .. note:: A job's directory is also its name, which is used to identify it. The job's name EX: `AutoPick/job004/` **requires** the trailing slash at the end. Jobs are created from parameter files ------------------------------------- Jobs can be created by reading from either or two types of **Parameter Files**: `run.job` or `job.star` Both files define the jobtype, if the job is new or a continuation of an old job, and the parameters or **JobOptions**. `run.job` files are more verbose and easier to manually edit: .. code-block:: text job_type == relion.autopick.log is_continue == false Pixel size in micrographs (A) == 1.02 Pixel size in references (A) == 3.54 ... `job.star` files have a more complicated format but have the advantage that the Pipeliner has functions to dynamically edit them: .. code-block:: text data_job _rlnJobTypeLabel relion.autopick.log _rlnJobIsContinue 0 data_joboptions_values loop_ _rlnJobOptionVariable #1 _rlnJobOptionValue #2 angpix 1.02 angpix_ref 3.54 ... .. note:: `job.star` and `run.job` files can be used interchangeably for almost all applications in a project. Getting a `run.job` or `job.star` file -------------------------------------- A parameter file with the default values for any job can be generated with :meth:`~pipeliner.api.api_utils.write_default_runjob` or :meth:`~pipeliner.api.api_utils.write_default_jobstar` API: .. code-block:: python from pipeliner.api.api_utils import default_runjob, default_jobstar default_runjob("relion.autopick.log") default_jobstar("relion.autopick.log") Command line: .. code-block:: console $ pipeliner --default_runjob relion.autopick.log $ pipeliner --default_jobstar relion.autopick.log This will create the files `relion_autopick_log_job.star` and `relion_autopick_log_run.job` Running a job ------------- With the paramter file created the job can now be run with :meth:`~pipeliner.api.manage_project.PipelinerProject.run_job`: API: .. code-block:: python my_project.run_job("relion_autopick_log_job.star") Command line: .. code-block:: console $ pipeliner --run_job relion_autopick_log_job.star This will create and run the job `AutoPick/job001/` Alternatively a job can be run from a :class:`dict` containing its parameters which can be generated with the function :meth:`pipeliner.api.api_utils.job_default_parameters_dict` This dict can be edited in place before using it to run a job. .. code-block:: python from pipeliner.api.manage_project import PipelinerProject from pipeliner.api.api_utils import job_default_parameters_dict proj = PipelinerProject(make_new_project=True) params = job_default_parameters_dict("relion.autopick.log") params["fn_input_autopick"] = "Path/to/new/input_file.mrc" proj.run_job(params) Continuing a job ---------------- Some jobs can be continued from where they finished. When a job is run a file `continue_job.star` is written in its job directory. This file contains only the parameters that are allowed to be modified when the job is continued. Edit this file if any parameters need to be changed and then continue the job with: API: .. code-block:: python my_project.continue_job("AutoPick/job001/") Command line: .. code-block:: console $ pipeliner --continue_job AutoPick/job001/ .. note:: The job's full name was used to continue it, *not* the name of the `continue_job.star` file Submitting jobs to a queue -------------------------- Pipeliner jobs can be submitted to a queuing system using a submission script template that incorporates values from the job's JobOptions. The pipeliner will update variables bracketed by `XXX` in the submission template from the job's JobOptions and then run the submission script using the command specified in the job's `qsub` JobOption. .. list-table:: Template variables updated from JobOptions :header-rows: 1 * - Script Variable - JobOption - GUI Field * - XXXmpinodesXXX :sup:`see note` - nr_mpi - Number of MPI procs: * - XXXthreadsXXX :sup:`see note` - nr_threads - Number of threads: * - XXXdedicatedXXX - min_dedicated - Minimum dedicated cores per node: * - XXXqueueXXX - queuename - Queue name: * - XXXextra1XXX - qsub_extra_1 - Set from environment variable PIPELINER_QSUB_EXTRA1 * - XXXextra2XXX - qsub_extra_2 - Set from environment variable PIPELINER_QSUB_EXTRA2 * - XXXextra3XXX - qsub_extra_3 - Set from environment variable PIPELINER_QSUB_EXTRA3 * - XXXextra4XXX - qsub_extra_4 - Set from environment variable PIPELINER_QSUB_EXTRA4 There are some additional variables available for submission scripts that are not drawn from the JobOptions: .. list-table:: Additional template variables :header-rows: 1 * - Script Variable - Substitution * - XXXnameXXX - The job's name; the same as its output directory * - XXXcoresXXX - The number of mpi processes multiplied by the number of threads * - XXXerrfileXXX - Path to the job's run.err file * - XXXoutfileXXX - Path to the job's run.out file * - XXXcommandXXX :sup:`see note` - The full commands list for the job. .. note:: The variable `XXXcommandXXX` will already have the mpirun command specified by the `mpirun_com` JobOption prepended to commands where necessary. It generally does NOT need to be included in the submission script. The default for `mpirun_com` is `mpi_run -n XXXmpinodesXXX` meaning the `XXXmpinodesXXX` variable generally also does not need to be included in the commands section of the submission script template. Similarly, the number of threads used by a job is usually set in the commands, so the `XXXthreadsXXX` variable rarely needs to be used. The submission script template must be written for your specific system. Here is an example submission script template for a cluster running SLURM: .. code-block:: console #!/bin/bash #SBATCH --ntasks=XXXmpinodesXXX #SBATCH --partition=XXXqueueXXX #SBATCH --cpus-per-task=XXXthreadsXXX #SBATCH --error=XXXerrfileXXX #SBATCH --output=XXXoutfileXXX #SBATCH --gres=gpu:2 XXXcommandXXX Modifying parameters -------------------- The python API can modify `job.star` parameter files on-the-fly using :meth:`~pipeliner.api.api_utils.edit_jobstar`. This avoids manual editing of the parameter files when stringing together multiple jobs: .. code-block:: python from pipeliner.api.api_utils import edit_jobstar movie_jobstar = my_project.write_default_jobstar("relion.import.movies") edit_jobstar(movie_jobstar, {"fn_in_raw": "Movies/*.mrcs"}) movie_job = my_project.run_job(movie_jobstar).output_name mocorr_jobstar = my_project.write_default_jobstar("relion.motioncorr.own") edit_jobstar(mocorr_jobstar, {"fn_in": movie_job.output_name + "movies.star"}) mocorr_job = my_project.run_job(mocorr_jobstar).output_name alternatively this can be done solely with dicts: .. code-block:: python from pipeliner.api.api_utils import job_default_parameters_dict import_params = job_default_parameters_dict("relion.import.movies") import_params["fn_in_raw"] = "Movies/*.mrcs" movie_job = my_project.run_job(import_params).output_name mocorr_params = job_default_parameters_dict("relion.motioncorr.own") mocorr_params["fn_in"] = movie_job.output_name + "movies.star" mocorr_job = my_project.run_job(mocorr_params).output_name Running schedules ----------------- Scheduling allows for sets of jobs to be run multiple times via :meth:`~pipeliner.api.manage_project.PipelinerProject.schedule_job` and :meth:`~pipeliner.api.manage_project.PipelinerProject.run_schedule` .. note:: When a job is scheduled placeholder files are created for all of its outputs so these files can be used as if they already exist. Here is running the same jobs as above, except using the scheduling functions to run the set of import and motion correction jobs 10 times: API: .. code-block:: python from pipeliner.api.manage_project import PipelinerProject from pipeliner.api.api_utils import write_default_jobstar, edit_jobstar my_project = PipelinerProject(make_new_project=True) movie_jobstar = write_default_jobstar("relion.import.movies") edit_jobstar(movie_jobstar, {"fn_in_raw": "Movies/*.mrcs"}, movie_jobstar) movie_job = my_project.schedule_job(movie_jobstar) mocorr_jobstar = write_default_jobstar("relion.motioncorr.own") edit_jobstar(mocorr_jobstar, {"input_star_mics": movie_job.output_dir + "movies.star"}, mocorr_jobstar) mocorr_job = my_project.schedule_job(mocorr_jobstar) my_project.run_schedule( fn_sched="my_schedule", job_ids=[movie_job.output_dir, mocorr_job.output_dir], nr_repeat=10, ) To accomplish this from the command line the parameter files for the Import and MotionCorr jobs must already have been created with the correct file names as inputs .. code-block:: console $ pipeliner --schedule_job $ pipeliner --schedule_job $ pipeliner --run_schedule --name my_schedule --jobs job001 job002 --nr_repeat 10 .. note:: The command line tool intelligently parses job names, so for the job named `Import/job001/` it would accept `job001` or `1` as well as the full job name Other job tools --------------- A variety of other tool exist for modifying jobs in the project. See the api documentation for how to use these functions: - :class:`~pipeliner.api.manage_project.PipelinerProject.set_alias` - Give a job an more descriptive name - :class:`~pipeliner.api.manage_project.PipelinerProject.run_cleanup` - Move intermediate files from jobs into the trash to save disk space - :class:`~pipeliner.api.manage_project.PipelinerProject.delete_job` - Move a job to the trash - :class:`~pipeliner.api.manage_project.PipelinerProject.undelete_job` - Remove a job from the trash and restore it to the project - :class:`~pipeliner.api.manage_project.PipelinerProject.empty_trash` - Permanently delete files in the trash - :class:`~pipeliner.api.manage_project.PipelinerProject.prepare_metadata_report` - Get metadata about an entire project Logging ------- Logging in the pipeliner is performed using Python's standard :mod:`logging` module. If you are using the pipeliner as a library, propagation of log messages can be disabled like this: .. code-block:: python pipeliner_logger = logging.getLogger("pipeliner") pipeliner_logger.propagate = False