CCPEM-Pipeliner API

The pipeliner api provides access to all of the main functions of the pipeliner

PipelinerProject

To interact with a pipeliner project it must be created as a PipelinerProject object

class pipeliner.api.manage_project.PipelinerProject(pipeline_name: str = 'default', project_name: Optional[str] = None, description: Optional[str] = None)

Bases: object

This class forms the basis for a project.

pipeline_name

The name of the pipeline. Defaults to default if not set. There is really no good reason to give the pipeline any other name.

Type

str

pipeline

The ProjectGraph containing all the info about the project

Type

ProjectGraph

project_name

A short descriptive name for the project, editable by the user

Type

str

description

A verbose description of the project, editable by the user

Type

str

cleanup_all(harsh: bool = False) bool

Runs cleanup on all jobs in a project

Parameters

harsh (bool) – Should harsh cleaning be performed?

Returns

True

compare_job_parameters(jobs_list: List[str]) dict

Compare the running parameters of multiple jobs

Parameters

jobs_list (list) – The jobs to compare

Returns

{parameter: [value, value, value]}

Return type

dict

Raises
  • ValueError – If any of the jobs is not found

  • ValueError – If the jobs being compared are not of the same type

continue_job(job_to_continue: str, wait_for_queued: bool = True, comment: Optional[str] = None) str

Continue a job that has already been run

To change the parameters in a continuation the user needs to edit the continue_job.star file in the job’s directory

Parameters
  • job_to_continue (str) – The name of the job to continue

  • wait_for_queued (bool) – If the job is being sent to a queue, should the pipeliner wait for the job to finish before starting the next job. Jobs run locally always wait for the job to finish before the next is started

  • comments (str) – Comments for the job’s jobinfo file

Returns

The name of the job that will be continued

Return type

str

Raises
  • ValueError – If the continue_job.star file is not found and there is no job.star file in the job’s directory to use as a backup

  • ValueError – If the job is of a type that needs a optimizer file to continue and this file is not found

  • ValueError – The job has iterations but the parameters specified would result in no additional iterations being run

create_archive(job: str, full: bool = False, tar: bool = True) str

Creates an archive

Archives can be full or simple. Simple archives contain the directory structure of the project, the parameter files for each job and a script to rerun the project through the terminal job. The full archive contains the full job dirs for the terminal job and all of its children

Parameters
  • job (str) – The name of the terminal job in the workflow

  • full (bool) – If True a full archive is written else a simple archive is written

  • tar (bool) – Should the newly written archive be compressed?

Returns

A message telling the type of archive and its name

Return type

str

delete_job(job: str) bool

Delete a job

Removes the job from the main project and moves it and its children it to the Trash

Parameters

job (str) – The name of the job to be deleted

Returns

True If a job was deleted, False if no jobs were deleted

Return type

bool

draw_flowcharts(job: Optional[str] = None, do_upstream: bool = False, do_downstream: bool = False, do_full: bool = False, save: bool = False, show: bool = False) Union[bool, Tuple[Optional[str], Optional[str], Optional[str]]]

Prepare flowcharts for visulizing a a project

Parameters
  • job (str) – The job for the flowchart to start or end on.

  • do_upstream (bool) – Should an upstream flowchart from the job be prepared?

  • do_downstream (bool) – Should a downstream flowchart from the job be prepared?

  • do_full (bool) – Should a full project flowchart be drawn?

  • save (bool) – Should the flowchart be saved as a file?

  • show (bool) – Should an interactive flowchart be shown?

Returns

Names of the upstream, downstream, and full flowchart files written.

The entry is "Not saved" if the flowchart was drawn but only used the interactive viewer and None no flowchart was draw for this option

Return type

tuple

edit_comment(job_name: str, comment: Optional[str] = None, overwrite: bool = False, new_rank: Optional[int] = None)

Edit the comment of a job

Parameters
  • job_name (str) – The name of the job to eddit the comment for

  • comment (str) – The comment to add/append

  • overwrite (bool) – if True overwrites otiginal comment, otherwise appends it to the current comment

  • new_rank (int) – New rank to assign to job, use -1 to revert the rank to None

Raises

ValueError – If the new rank is not None or an integer

empty_trash()

Deletes all the files and dirs in the Trash directory

Returns

True if any files were deleted, False If no files were deleted

Return type

bool

find_job_by_comment(contains: Optional[List[str]] = None, not_contains: Optional[List[str]] = None, job_type: Optional[str] = None, command: bool = False) List[str]

Find Jobs by their comments or command

Parameters
  • contains (list) – Find jobs that contain all of the strings in this list

  • not_contains (list) – Find jobs that do not contain any of these strings

  • job_type (str) – Only consider jobs who’s type contain this string

  • command (bool) – If True searches the job’s command history rather than its comments

Returns

Names of all the jobs found

Return type

list

Raises

ValueError – If nothing is specified for contains and not_contains

find_job_by_rank(equals: Optional[int] = None, less_than: Optional[int] = None, greater_than: Optional[int] = None, job_type: Optional[str] = None) List[str]

Find jobs by their rank

Ignores jobs that are unranked

Parameters
  • equals (int) – Find jobs with this exact rank

  • less_than (int) – Find jobs with ranks less then this number

  • greater_than (int) – Find jobs with ranks higher than this number

  • job_type (str) – Only consider jobs that contain this string in their job type

Returns

Names of the matching jobs

Return type

list

Raises
  • ValueError – If nothing is specified to search for

  • ValueError – If both equals and less_than/greater than are specified

get_job_metadata(jobname: str, output_name: Optional[str] = None) dict

Runs the gather_metadata function of a single job

Parameters
  • jobname – The name of the job to run on

  • output_name – File to write json to. If None, no file is written

Returns

Metadata dict for for the job

Return type

dict

get_job_runtime(job: str) tuple

Returns info about how long a job took to run

Parameters

job (str) – The name of the job to run on

Returns

total times and list of steps and their times

(total_real, total_user, total_sys, {step: (real, user, sys)}, job status)

Return type

Tuple

get_network_metadata(jobname: str, output_name: Optional[str] = None) dict

Returns a full metadata trace for a job and all upstream jobs

Parameters
  • jobname – The name of the job to run on

  • output_name – File to write json to. If None, no file is written

Returns

Metadata dict for all the jobs

Return type

dict

initialize_existing_project()

make sure the pipeline for the project is current

This function is called by most other functions before running. There is usually no need to call it directly.

parse_proclist(list_o_procs: list, search_trash: bool = False) list

Finds full process names for multiple processes

Returns full process names IE: Import/job001/ from job001 or 1

Parameters
  • list_o_procs (list) – A list of string process names

  • search_trash (bool) – Should the trash also be search?

Returns

All of the full process names

Return type

list

parse_procname(in_proc: str, search_trash: bool = False) Optional[str]

Find process name with the ability for parse ambigious input.

Returns full process names IE: Import/job001/ from job001 or 1 Can look in both active processes and the Trash Can, accepts inputs containing only job number and process type and alias IE Import/my_alias

Parameters
  • in_proc (str) – The text that is being checked against the list of

  • search_trash (processes) – Should it return the process name if the process is in the trash?

Returns

the process name

Return type

str

Raises
  • ValueError – if the process was in the trash but search_trash is false

  • ValueError – if the process name is not in the pipeliner format, jobxxx, or a number. IE: An unrelated string

  • ValueError – if the process name is not found

run_cleanup(jobs: list, harsh: bool = False) bool

Run the cleanup function for multiple jobs

Each job defines its own method for cleanup and harsh cleanup

Parameters
  • jobs (list) – List of string job names to operate on

  • harsh (bool) – Should harsh cleaning be performed

Returns

True if cleanup is successful, otherwise False

Return type

bool

run_job(jobinput: Union[str, dict, pipeliner.pipeliner_job.PipelinerJob], overwrite: Optional[str] = None, wait_for_queued: bool = True, comment: Optional[str] = None) str

Run a new job in the project

If a file is specified the job will be created from the parameters in that file If a dict is input the job will be created with defaults for all options except those specified in the dict.

If a dict is used for input it MUST contain at minimum {“_rlnJobTypeLabel”: <the jobtype>}

Parameters
  • jobinput (str, dict, PipelinerJob) – The path to a run.job or job.star file that defines the parameters for the job or a dict specifying job parameters or a PipelinerJob object

  • overwrite (str) – The name of a job to overwrite, if None a new job will be created. A job can only be overwritten by a job of the same type

  • wait_for_queued (bool) – If the job is being sent to a queue, should the pipeliner wait for the job to finish before starting the next job. Jobs run locally always wait for the job to finish before the next is started

  • comment (str) – Comments to be added to the job’s info file

Returns

The name of the job that was run

Return type

str

Raises

ValueError – If this method is used to continue a job

run_schedule(fn_sched: str, job_ids: List[str], nr_repeat: int = 1, minutes_wait: int = 0, minutes_wait_before: int = 0, seconds_wait_after: int = 5) str

Runs a list of scheduled jobs

Parameters
  • fn_sched (str) – A name to assign to the schedule

  • job_ids (list) – A list of string job names to run

  • nr_repeat (int) – Number of times to repeat the entire schedule

  • minutes_wait (int) – Minimum number of minutes to wait between running each subsequent job

  • minutes_wait_before (int) – Initial number of minutes to wait before starting to run the schedules.

  • seconds_wait_after (int) – Time to wait after running each job

Returns

The name of the schedule that is run

Return type

str

Raises

ValueError – If the schedule name is already in use

schedule_continue_job(job_to_continue: str, params_dict: Optional[dict] = None, comments: Optional[str] = None) str

Schedule a job to run

Adds the job to the pipeline with scheduled status, does not run it

Parameters
  • job_to_continue (str) – the name of the job to continue

  • params_dict (dict) – Parameters to change in the continuation job.star file. {param name: value}

  • comments (str) – comments to add to the job’s jobinfo file

Returns

The name of the scheduled job

Return type

str

schedule_job(job_input: str, comment: Optional[str] = None) str

Schedule a job to run

Adds the job to the pipeline with scheduled status, does not run it

Parameters
  • job_input (str) – The path to a run.job or job.star file that defines the parameters for the job or a dictionary containing job parameters

  • comments (str) – Comments to put in the job’s jobinfo file

Returns

The name of the scheduled job

Return type

str

set_alias(job: str, new_alias: str) bool

Set the alias for a job

Parameters
  • job (str) – The name of the job to set the alias for

  • new_alias (str) – The new alias

Returns

True if the alias was changed, else False

Return type

bool

stop_schedule(schedule_name: str) bool

Stops a currently running schedule

Kills the process running the schedule and marks the currently running job as aborted. Works to stop schedules that were started using the RELION GUI or pipeliner.

Parameters

schedule_name (str) – The name of the schedule to stop

Returns

True If the schedule was stopped, False if the schedule could not be found to stop

Return type

bool

undelete_job(job: str) bool

Restores a job from the Trash back into the project

Also restores the job’s alias if one existed

Parameters

job (str) – The job to undelete

Returns

True If a job was restored, otherwise False

Return type

bool

update_job_status(job: str, new_status: str) bool

Mark a job as finished, failed or aborted

If is_failed and is_aborted are both False the job is marked as finished.

Parameters
  • job (str) – The name of the job to update

  • new_status (str) – The new status for the job; Choose from “Running”, , “Scheduled”, “Succeeded”, “Failed or “Aborted”. Status names are not case sensitive

Returns

True if the status was updated, otherwise False

Return type

bool

Raises

ValueError – If the new status is not one of the options

pipeliner.api.manage_project.convert_pipeline(pipeline_file: str) bool

Converts a pipeline file from the RELION 2.0-3.1 format

This format has integer node, process, and status IDs. The pipeliner format uses string IDs

Parameters

pipeline_file (str) – The name of the file to be converted

Returns

The result of the conversion

True if the pipeline was converted, False if the pipeline was already in pipeliner format

Return type

bool

pipeliner.api.manage_project.get_commands_and_nodes(job_file: str) tuple

Tell what commands a job file would return and nodes that would be created

Parameters

job_file (str) – The path to a run.job or job.star file

Returns

Three lists

  • A list of commands. Each item in the commands list is a list of commands arguments. IE: [[com1-arg1, com1-arg2],[com2-arg1]]

  • A list of input nodes that would be created. Each item in the list is a tuple: [(name, type), (name, type)]

  • A list of output nodes that would be created. Each item in the list is a tuple: [(name, type), (name, type)]

Return type

tuple

pipeliner.api.manage_project.look_for_project(pipeline_name: str = 'default') Optional[dict]

See if a pipeliner project exists in the current directory

Parameters

pipeline_name (str) – The name of the pipeline to look for. This is the same as the pipeline file name with _pipeline.star removed

Returns

(bool:was the pipeline found?, dict info about the

project)

Return type

tuple

api_utilities

Utility functions do not require an existing project

pipeliner.api.api_utils.edit_jobstar(fn_template: str, params_to_change: dict, out_fn: str) str

Modify one or more parameters in a job.star file

Parameters
  • fn_template (str) – The name of the job.star file to use as a template

  • params_to_change (dict) – The parameters to change in the format {param_name: new_value}

  • out_fn (str) – Name for the new file to be written

Returns

The name of the output file written

Return type

str

pipeliner.api.api_utils.get_available_jobs(search_term: str = '*ALL*') List[Tuple[str, bool, pipeliner.pipeliner_job.JobInfo]]

Returns all the available job types and info about them

Parameters

search_term (str) – Only return jobs with this string in their jobtype name

Returns

A list with a tuple for each job; (jobtype, can it run?, JobInfo object)

(str, bool, pipeliner.pipeliner_job.JobInfo)

Return type

list

pipeliner.api.api_utils.get_job_info(job_type: str) Optional[pipeliner.pipeliner_job.JobInfo]

Get information about a job

Parameters

job_type (str) – The type of job to return info on

Returns

JobInfo object with info about the job and it’s references

Return type

JobInfo

pipeliner.api.api_utils.job_parameters_dict(jobtype: str) dict

Get dictionary of a job’s parameters

Parameters

jobtype (str) – The type of job to get the dict for

Returns

The parameters dict. Suitable for running a job from

run_job()

Return type

dict

pipeliner.api.api_utils.job_success(job_name: str, search_time: float = 0, raise_error: bool = False, error_message: str = '') bool

Check that a finished job has produced the expected control files

Parameters
  • job_name (str) – The name of the job to be checked in the format JobType/jobxxx/

  • search_time (float) – Time in minutes to spend looking for the control files before givng up

  • raise_error (bool) – Should an error be raised if the job is failed/aborted rather than returning a bool

  • error_message (str) – Additional text (if any) to print before the error reason

Returns

True if the job was successful, False if the job was failed, aborted,

or no control files have appeared after the set search time and raise_error is False

Return type

bool

Raises
  • RuntimeError – If the job was failed, aborted, or no control files have appeared

  • after the set search time and raise_error is True

pipeliner.api.api_utils.validate_starfile(fn_in: str)

Checks for inappropriate use of reserved words in starfiles

Writes a corrected version with proper quotation if possible using the StarfileCheck class

Parameters

fn_in (str) – The name of the file to check

pipeliner.api.api_utils.write_default_jobstar(job_type: str, out_fn: Optional[str] = None, relionstyle: bool = False)

Write a job.star file for the specified type of job

The default jobstar contains all the job options with their values set as the defaults

Parameters
  • job_type (str) – The type of job

  • out_fn (str) – Name of the file to write the output to. If left blank defaults to <job_type>_job.star

  • relionstyle (bool) – Should the job.star files be written in the relion format? Relion files are compatible with the pipeliner, but the pipeliner versions are not back compatible with Relion. If this option is selected a Relion job type should be used for job_type

Returns

The name of the output file written

Return type

str

pipeliner.api.api_utils.write_default_runjob(job_type: str, out_fn: Optional[str] = None) str

Write a run.job file for the specified type of job

The default runjob contains all the job option labels with their values set as the defaults

Parameters
  • job_type (str) – The type of job

  • out_fn (str) – Name of the file to write the output to. If left blank defaults to <job_type>_run.job

Returns

The name of the output file written

Return type

str