Pipeliner Jobs

class pipeliner.pipeliner_job.JobInfo(display_name: str = 'Pipeliner job', version: str = '0.0', job_author: Optional[str] = None, short_desc: str = 'No short description for this job', long_desc: str = 'No long description for this job', documentation: str = 'No online documentation available', programs: Optional[list] = None, references: Optional[list] = None, software_vers: str = 'No version info available')

Bases: object

Class for storing info about jobs.

This is used to generate documentation for the job within the pipeliner

display_name

A user-friendly name to describe the job in a GUI, this should not include the software used, because that info is pulled from the job type

Type

str

version

The version number of the pipeliner job

Type

str

software_vers

The version of the outside executables that will be used

Type

str

job_author

Who wrote the pipeliner job

Type

str

short_desc

A one line “title” for the job

Type

str

long_desc

A detained description about what the job does

Type

str

documentation

A URL for online documentation

Type

str

programs

A list of 3rd party software used by the job. These are used by the pipeliner to determine if the job can be run so they need too be the names of all executables the job might call. If any program on this list cannot be found with which then the job will be marked as unable to run.

Type

list

references

A list of Ref objects

Type

list

class pipeliner.pipeliner_job.PipelinerJob

Bases: object

Super-class for job objects.

Each job type has its own sub-class.

WARNING: do not instantiate this class directly, use the factory functions in this module.

jobinfo

Contains information about the job such as references

Type

JobInfo

output_name

The path of the output directory created by this job

Type

str

alias

the alias for the job if one has been assigned

Type

str

is_continue

If this job is a continuation of an older job or a new one

Type

bool

input_nodes

A list of Node objects for each file used as in input

Type

list

output_nodes

A list of Node objects for files produced by the job

Type

list

joboptions

A dict of JobOption objects specifying the parameters for the job

Type

dict

final_commands

A list of commands to be run by the job. Each item is a list of arguments

Type

list

is_mpi

Does the job use multi-threading?

Type

bool

is_tomo

Is the job a tomography job?

Type

bool

vers_com

[0] The command to run to which will print the program’s version info to the STDOUT, [1] The lines from the stdout to display, if () all lines will be displayed, make sure that the value is a tuple IE: (1,) rather than (1)

Type

tuple(list, tuple)

OUT_DIR = ''
PROCESS_NAME = ''
clear()

Clear all attributes of the job

create_input_nodes()

Automatically add the job’s input nodes to it’s nodelist

create_results_display()

This function creates the objects to be displayed by the GUI

Placeholder for individual jobs to have their own

Returns

a ResultsDisplayText saying

there is no specific method for this job

Return type

list

default_params_dict() dict

Get a dict with the job’s parameters and default values

Returns

All of the job’s parameters {parameter: default value}

Return type

dict

gather_metadata()

Placeholder function for metadata gathering

Each job class should define this individually

Returns

A place holder “No metadata available” and the reason why

Return type

dict

get_commands()
get_current_output_nodes() list

Get the current output nodes if the job was stopped prematurely

For most jobs there will not be any but for jobs with many iterations the most recent interation can be used of teh job is aborted or failed and then later marked as successful

Parameters

new_status (str) – The new status - what actions are performed will be dependent on this

Returns

of Node objects

Return type

list

get_extra_options()

Get user specified extra queue submission options

get_job_vers()

Get the current version of the software available in this system

Returns

The version info, or No version info available if no version

command was specified

Return type

str

get_runtab_options(mpi: bool = True, threads: bool = True)

Get the options found in the Run tab of the GUI, which are common to for all jobtypes

Adds entries to the joboptions dict for queueing, MPI, threading, and additional arguments. This method should be used when initialising a PipelinerJob subclass

Parameters
  • mpi (bool) – Should MPI options be included?

  • threads (bool) – Should multi-threading options be included

initialise_pipeline(outputname: str, defaultname: str, job_counter: int) str

Gets the pipeline ready to add a new job

Sets the output name and clears the input and output nodes

Parameters
  • outputname (str) – Where the job should write its results. If blank it is set to the next job number based on the job counter

  • defaultname (str) – The name of the job type

  • job_counter (int) – The number that job will get

Returns

The output name

Return type

str

make_additional_args()

Get the additional arguments job option

make_queue_options()

Get options related to queueing and queue submission, which are common to for all jobtypes

parameter_validation() list

Advanced validation of job parameters

This is a placeholder function for additional validation to be done by individual job subtypes, such as comparing JobOption values IE: JobOption A must be > JobOption B

Returns

A list of error messages. If no errors are found should return an empty list

Return type

list

parse_additional_args()

Parse the additional arguments job option and return a list

Returns

A list ready to append to the command. Quotated strings are preserved

as quoted strings all others are split into individual items

Return type

list

post_run_actions() bool

Placeholder function for actions to do after the job has finished

Each job class should define this individually. This is used for job where somthing needs to be done after the jobhas completed, such as jobs where the number/names of output nodes is not known until the job has finished

prepare_clean_up_lists(do_harsh: bool = False)

Placeholder function for preparation of list of files to cleanup

Each job class should define this individually

Parameters

do_harsh (bool) – Should a harsh cleanup be performed

Returns

Two empty lists ([files, to, delete], [dirs, to, delete])

Return type

tuple

prepare_final_command(outputname: str, commands: list, do_makedir: bool, ignore_queue: bool = False) list

Assemble commands to be run for a job

The commands are in a lists of lists format. Each item in the main list is a single command and composed of a list of the arguments for that command.

An additional command to run the check completion script is added to the commands list.

Decides if a queue submission script is needed. If so it is written and the commands list is changed to the queue submission command

Parameters
  • outputname (str) – The job’s output directory

  • commands (list) – The commands to run as a list of lists

  • do_makedir (bool) – Should the output directory be created if it doesn’t already exist?

  • ignore_queue (bool) – Do not make a submission script, even if the job is sent to the queue, used for generating commands for display

Returns

[[[Actual, command], [to, be, run]], [[the, Job, commands]]] If the

job is being submitted to a queue [0] will be the qsub command and [1] will the be the actual job commands. For local jobs they will be identical

Return type

list

prepare_onedep_data() list

Placeholder for function to return deposition data objects

The specific list returned should be defined by each jobtype

Returns

The deposition object(s) returned by the specific job. These

need to be of the types defined in pipeliner.onedep_deposition

Return type

list

read(filename: str)

Reads parameters from a run.job or job.star file

Parameters

filename (str) – The file to read. Can be a run.job or job.star file

Raises

ValueError – If the file is a job.star file and job option from the PipelinerJob is missing from the input file

save_job_submission_script(output_script: str, outputname: str, commands: list, nmpi: int) str

Writes a submission script for jobs submitted to a queue

Parameters
  • output_script (str) – The name for the script to be written

  • output_name (str) – The job’s output name

  • commands (list) – The job’s commands. In a list of lists format

  • nmpi (int) – The number of MPI used by the job. Should be 1 if the job is not multi-threaded

Returns

The name of the submission script that was written

Return type

str

Raises
  • ValueError – If no submission script template was specified in the job’s joboptions

  • ValueError – If the submission script template is not found

  • RuntimeError – If the output script could not be written

set_option(line: str)

Sets a value in the joboptions dict from a run.job file

Parameters

line (str) – A line from a run.job file

Raises
  • RuntimeError – If the line does not contain ‘==’

  • RuntimeError – If the value of the line does not match any of the joboptions keys

validate_dynamically_required_joboptions()

Validate joboptions that only become required in relation to others

For example if job option A is True, job option B is now required

Returns

pipeliner.job_options.JobOptionValidationResult:

for any errors found

Return type

list

validate_input_files() list

Check that files specified as inputs actually exist

Returns

A list of pipeliner.job_options.JobOptionValidationResult

objects

Return type

list

validate_joboptions() list

Make sure all of the joboptions meet their validation criteria

Returns

tuple for each joboption that had errors

[(joboption, desc, error)]

Return type

list

write_jobstar(output_dir: str, output_fn: str = 'job.star', is_continue: bool = False)

Write a job.star file.

Parameters
  • output_fn (str) – The name of the file to write. Defaults to job.star

  • is_contine (bool) – Is the file for a continuation of a previously run job? If so only the parameters that can be changed on continuation are written. Overrules is_continue attribute of the job

write_runjob(fn: Optional[str] = None)

Writes a run.job file

Parameters

fn (str) – The name of the file to write. Defaults to the file the pipeliner uses for storing GUI parameters. A directory can be entered also and it will add on the file name ‘run.job’

class pipeliner.pipeliner_job.Ref(authors: Optional[Union[str, List[str]]] = None, title: str = '', journal: str = '', year: str = '', volume: str = '', issue: str = '', pages: str = '', doi: str = '', **kwargs)

Bases: object

Class to hold metadata about a citation or reference, typically a journal article.

authors

The authors of the reference.

Type

list

title

The reference’s title.

Type

str

journal

The journal.

Type

str

year

The year of publication.

Type

str

volume

The volume number.

Type

str

issue

The issue number.

Type

str

pages

The page numbers.

Type

str

doi

The reference’s Digital Object Identifier.

Type

str

other_metadata

Other metadata as needed. Gathered from kwargs

Type

dict