Pipeliner Jobs

Pipeliner jobs

class pipeliner.pipeliner_job.JobInfo(display_name: str = 'Pipeliner job', version: str = '0.0', job_author: str | None = None, short_desc: str = 'No short description for this job', long_desc: str = 'No long description for this job', documentation: str = 'No online documentation available', external_programs: List[ExternalProgram] | None = None, references: List[Ref] | None = None)

Bases: object

Class for storing info about jobs.

This is used to generate documentation for the job within the pipeliner

display_name

A user-friendly name to describe the job in a GUI, this should not include the software used, because that info is pulled from the job type

Type:: str

version

The version number of the pipeliner job

Type:: str

job_author

Who wrote the pipeliner job

Type:: str

short_desc

A one line “title” for the job

Type:: str

long_desc

A detained description about what the job does

Type:: str

documentation

A URL for online documentation

Type:: str

programs

A list of 3rd party software used by the job. These are used by the pipeliner to determine if the job can be run, so they need to include all executables the job might call. If any program on this list cannot be found with which then the job will be marked as unable to run.

Type:: List[ExternalProgram]

references

A list of Ref objects used

Type:: List[Ref]

alternative_unavailable_reason

This can be set to a string explaining why the job is unavailable if other checks for the job to be available (besides programs missing from the $PATH) have failed, e.g. a necessary library is missing.

Type:: str

can_continue

Can this job be continued?

Type:: bool

property is_available

Is the job available to run?

True if executables were found for all the job’s programs or if alternative_unavailable_reason has been set, or False otherwise.

property unavailable_message: str | None

Gives the reason the pipeliner has marked the job as unavailable

Returns:: The reason the job is marked unavailable, or None if it is available
Return type:: Optional[str]

class pipeliner.pipeliner_job.PipelinerCommand(cmd: Sequence[str | float | int], *, relion_control: bool = False, allow_failure: bool = False)

Bases: object

Holds a command that will be run by the pipeliner

cmd

The command that will be run. Each list item is one arg.

Type:: List[str, float, int]

relion_control

Does the command need the relion ‘–pipeline_control’ argument appended before being run

Type:: bool

allow_failure

If True, then the job will be allowed to continue even if this command fails. Be cautious if you use this option! You will need to consider all other places where the output of this command might be used and make the code suitably robust to handle possible missing files. In general, you should not set this option for any command that is required to produce an output node that might be used as input by another job.

Type:: bool

get_final_command(output_dir: str) → List[str]

Get the final command to run from this PipelinerCommand object.

This makes a number of changes to the original command in self.cmd:

All args are converted to strings.
If the command is “python” or “python3”, the command name is replaced with
the correct command to run the current Python executable.
The command name is replaced by the path from shutil.which if one is
found, otherwise it is left as a simple name.
If self.relion_control is set, the Relion pipeline control arguments
are appended to the command.
If a $ sign is found in an argument, any environment variables in that
argument are expanded with os.path.expandvars.

Parameters:: output_dir – The output dir for the job that runs this command. This value is only used if self.relion_control is set.
Returns:: The final command and its arguments, in a list ready to pass to subprocess.run.

class pipeliner.pipeliner_job.PipelinerJob

Bases: object

Super-class for job objects.

Each job type has its own sub-class.

WARNING: do not instantiate this class directly, use the factory functions in this module.

jobinfo

Contains information about the job such as references

Type:: JobInfo

output_dir

The path of the output directory created by this job

Type:: str

alias

the alias for the job if one has been assigned

Type:: str

is_continue

If this job is a continuation of an older job or a new one

Type:: bool

input_nodes

A list of Node objects for each file used as in input

Type:: list

output_nodes

A list of Node objects for files produced by the job

Type:: list

joboptions

A dict of JobOption objects specifying the parameters for the job

Type:: dict

is_tomo

Is the job a tomography job?

Type:: bool

working_dir

The working directory to be used when running the job. This should normally be left as None, meaning the job will be run in the project directory. Jobs that write files in their working directory should instead work somewhere within the job’s output directory, and take care to adjust the paths of input and output files accordingly.

Type:: str

raw_options

A dict of all raw joboptions as they were read in

Type:: dict

CATEGORY_LABEL = ''

OUT_DIR = ''

PROCESS_NAME = ''

add_compatibility_joboptions() → None

Write additional joboptions for back compatibility

Some JobOptions are needed by the original program (hey Relion 4), but not the pipeliner, they are added here so the files pipeliner writes will be back compatible with their original program.

add_main_emdb_processing_data() → List[OneDepDepositionObject]

Make an em_image_processing DepositionObject for a job

This is the base DepositionObject for all jobs that generate results. Most jobs should generate this along with the associated more specific deposition objects.

This deposition object should ALWAYS be created last, as it will update the cross-references in the associated depobjs when it is created.

Returns:: The em_image_processing DepositionObject
Return type:: DepositionObject

add_onedep_deposition_data(category: str, data_dict: Dict[str, str | None]) → OneDepDepositionObject

Create a deposition object specific to this job

Parameters:

category (str) – the type of deposition object
data_dict (Dict[str, str]) – The fields and values in the resulting depobj’s data attr that need to be updated. Values not specificed will be left as default

add_onedep_metadata_import()

add_output_node(file_name: str, node_type: str, keywords: List[str] | None = None) → None

Helper function to add a new Node for a file in the job’s output directory.

This is a wrapper around node_factory.create_node which simply adds self.output_dir to the start of the file name before creating the node and adding it to self.output_nodes.

Parameters:

file_name – The name of the file that the new node will refer to. It is assumed that the file will be written to the job’a output directory. Note that the existence of the file is not checked, because this method will usually be called before the job has run.
node_type – The top-level type for the new node. This should almost always be one of the constants defined in pipeliner.nodes.
keywords – A list of keywords to append to the node type.

additional_joboption_validation() → List[JobOptionValidationResult]

Advanced validation of job parameters

This is a placeholder function for additional validation to be done by individual job subtypes, such as comparing JobOption values IE: JobOption A must be > JobOption B

Avoid using self.get_string or self.get_number in this function as they may raise an error if the JobOption is required and has no value. Use self.joboptions[“jobopname”].value.

Returns:

A list JobOptionValidationResult: objects

Return type:

list

check_joboption_is_now_deactivated(jo: str) → bool

Check if a joboption has become deactivated in relation to others

For example if job option A is False, job option B is now deactiavted

Parameters:: jo (str) – The name of the JobOption to test
Returns:: Has the JobOption been deactivated
Return type:: bool

check_joboption_is_now_required(jo: str) → list

Check if a joboption has become required in relation to others

For example if job option A is True, job option B is now required

Parameters:

jo (str) – The name of the joboption to test

Returns:

pipeliner.job_options.JobOptionValidationResult:: for any errors found

Return type:

list

continuation_joboption_updates() → None

Modifications to existing JobOptions for when a job is continued

Set a JobOption’s value for the continuation job.

This function can also do things like change the hard or suggested min/max, limit or expand multiple choice options, or make the regex validation in a StringJobOption more restrictive.

create_input_nodes(existing_nodes: Dict[str, Node] | None = None) → None

Automatically add the job’s input nodes to its input node list.

Input nodes are created from each of the job’s job options.

Parameters:: existing_nodes – A dict of {node_name: Node object} for the current nodes in the pipeline, if available. This allows jobs to re-use existing nodes rather than creating new ones, which can be necessary if the type of the node is uncertain.

create_output_nodes() → None

Make the job’s output nodes.

This method should be overridden by PipelinerJob subclasses.

The output nodes should be added to the list in the output_nodes attribute. The add_output_node function is helpful to create and add a new node in a single call.

If your job doesn’t make any output nodes, or doesn’t know what their names will be until the job has been run, you still need to override this method but your implementation can simply pass and do nothing. If you need to add output nodes at the end of the job, create them in create_post_run_output_nodes.

Note that this method is called by the job manager (via PipelinerJob.prepare_to_run) before the job is added to the pipeline. The job’s output directory does exist when this method is called, but that could change in future versions of the pipeliner and jobs should avoid making any file system changes in this method.

create_post_run_output_nodes()

Placeholder function for post run node creation

Some jobs have output nodes that can only be created after the job has run because their names are not known until after they have been created. They can be added here. This function should ONLY add output nodes; any other work should be done in commands run by the job.

create_results_display() → Sequence[ResultsDisplayObject]

Create results display objects to be displayed by the GUI

This default implementation simply creates the default results display object for each of the job’s output nodes. Subclasses that want customised results should override this method.

Returns:: A list of ResultsDisplayObject

evaluate_qsub_template(sub_text: str) → None

Check that the qsub template is appropriate for the pipeliner

Looks for the common differences between relion style sub script and pipeliner style. Issues warnings if suspected problems are found but still allows it to run.

Parameters:: sub_text (str) – The text of the submission script

format_onedep_cif(input_file) → Document | None

If the job returns a cif that is going to be used as a deposition model

Do whatever needs to be done to make the cif compliant with both gemmi and the onedep format.

Parameters:: input_file – The cif file produced by the job
Returns:: A cif document for file tha can be used for appending
Return type:: gemmi.cif.Document

Placeholder function for metadata gathering

Each job class should define this individually

Returns:: A placeholder “No metadata available” and the reason why
Return type:: dict

get_additional_onedep_files() → Dict[str, str]

Files from this job that should be uploaded with the deposition files

These files will only be included in the deposition if the job produced the primary map

ToDo: This may be expanded to include the model if it becomes necessary to: add any model-job associated files to the uploads, but none are currently needed

Returns:: Dictionary of file names and their descriptions
Return type:: Dict[str, str]

get_additional_reference_info() → List[Ref]

A placeholder function for job that need to return additional references

This if for references that are not included in self.jobinfo, such as ones pulled from the EMDB/PDB in fetch jobs

get_category_label() → str

Get a label for the category that this job belongs to.

If the job defines a CATEGORY_LABEL attribute, its value is simply returned. Otherwise, the second part of the process name is processed to produce a label by replacing underscores with spaces and converting to title case.

get_commands() → List[PipelinerCommand]

Get the commands to be run for a specific job.

This method should be overridden by PipelinerJob subclasses.

Jobs are normally run with the project directory as the working directory. If your job needs to run in a different working directory (for example if it calls a program which always writes files into the current directory), set the self.working_dir attribute in this method.

Note that this method should run quickly! Any long-running actions should be done in one of the job’s commands instead. (If necessary, put Python code that needs to be run into a separate script in pipeliner.scripts.job_scripts and then call it as a command.)

Returns:: The commands as a list of PipelinerCommand objects

get_current_output_nodes() → List[Node]

Get the current output nodes if the job was stopped prematurely

For most jobs there will not be any but for jobs with many iterations the most recent interation can be used if the job is aborted or failed and then later marked as successful

Returns:: of Node objects
Return type:: list

get_default_params_dict() → Dict[str, str]

Get a dict with the job’s parameters and default values

Returns:: All the job’s parameters {parameter: default value}
Return type:: dict

get_extra_options() → None: Get user specified extra queue submission options

get_job_subdirs()

Get all the subdirectories contained in the jobdir excluding the NodeDisplay

returns List[str]: The dirs

get_joboption_groups() → Dict[str, List[str]]

Put the joboptions in groups according to their jobop_group attribute

Assumes that the joboptions have already been put in order of priority by self.set_joboption_order() or were in order to begin with.

Groups are ordered based on the highest priority joboption in that group from the order of the joboptions, except that “Main” is always the first group. Joboptions within the groups are ordered by priority.

Returns:: The joboptions groupings {group: [jopbop, … jobop]}
Return type:: Dict[str, List[str]]

get_mpi_command() → List[int | float | str]

get_nr_mpi() → int

get_nr_threads() → int

get_onedep_symmetry() → str | None

If the job needs to return a onedep symmetry entry get the symmetry name

This is the function that should be edited for each specific PipelinerJob

Returns:: The symmetry name (Cn, Dn, T, I, or O)
Return type:: Optional[str]

get_onedep_symmetry_object() → OneDepDepositionObject | None

Get the one dep symmetry object for this job

calls self.onedep_symmetry_data

Returns:: The one dep symmetry object is one was created
Return type:: Optional[DepositionObject]

get_program(search_name: str) → ExternalProgram

Get the object for a program from this job’s programs

Parameters:: search_name (str) – program name or exe
Returns:: The program object for the program
Return type:: ExternalProgram
Raises:: ValueError – if the program name or exe is not found

get_program_version_commands(program: str = '') → List[PipelinerCommand]

Get commands to print a program’s version at the top of the log file

Some programs don’t print the version when run. Version info in the logs is necessary for deposition and metadata tools to get the actual program version at runtime

Parameters:: program (str) – program name, if an empty string does all the programs in the job’s jobinfo that have a version command defined
Returns:: The program version command(s), ready to add to the command list
Return type:: List[PipelinerCommand]
Raises:: ValueError – if a program name is given and it cannot be found

get_runtab_options(mpi: bool = False, threads: bool = False, addtl_args: bool = False, mpi_default_min: int = 1, mpi_must_be_odd: bool = False) → None

Get the options found in the Run tab of the GUI, which are common to for all jobtypes

Adds entries to the joboptions dict for queueing, MPI, threading, and additional arguments. This method should be used when initialising a PipelinerJob subclass

Parameters:

mpi (bool) – Should MPI options be included?
threads (bool) – Should multi-threading options be included
addtl_args (bool) – Should and ‘additional arguments’ be added
mpi_default_min (int) – The minimum for the default number of MPIs, will be used if mpi_default_min > user defined min number of MPI
mpi_must_be_odd (bool) – Does the number of mpis have to be odd, like for relion refine_jobs.

handle_doppio_uploads(dry_run=False) → None

Tasks that have to be performed to deal with Doppio file uploads.

Move files from DoppioUploads to the job dir:
DoppioUploads/tmpdir/file -> JobType/jobNNN/InputFiles/file
Update the job option values to point to the new file locations, so when the job input nodes are created they refer to the moved files

Parameters:: dry_run – If True, do not actually try to move any files, just update the job option values. This option is only intended for use in testing.

is_submit_to_queue() → bool

load_results_display_files() → Sequence[ResultsDisplayObject]

Load the job’s results display objects from files on disk.

This method must be fast because it is used by the GUI to load job results. Therefore, if a display object fails to load properly, no attempt is made to recalculate it and a ResultsDisplayPending object is returned instead.

If there are no results display files yet, an empty list is returned.

Returns:: A list of ResultsDisplayObject

make_additional_args() → None: Get the additional arguments job option

make_complete_onedep_deposition_objects() → List[OneDepDepositionObject]

Get the deposition objects for a job

Gets the depobjs generated by the job’s specific methods and adds the general ones for its reference(s) and software

make_queue_options() → None: Get options related to queueing and queue submission, which are common to for all jobtypes

parse_additional_args() → List[str]

Parse the additional arguments job option and return a list

Returns:

A list ready to append to the command. Quotated strings are preserved: as quoted strings all others are split into individual items

Return type:

list

prepare_clean_up_lists(do_harsh: bool = False) → Tuple[List[str], List[str]]

Placeholder function for preparation of list of files to clean up

Each job class should define this individually

Parameters:: do_harsh (bool) – Should a harsh cleanup be performed
Returns:: Two empty lists ([files, to, delete], [dirs, to, delete])
Return type:: tuple

prepare_empiar_deposition_data() → List[EMPIARDepositionObject]

This function makes the deposition objects aside from citations and software

This is the function that should be editied for each specific PipelinerJob

Returns:

All the deposition objects, except software and: citations

Return type:

List[DepositionObject]

prepare_for_continuation() → None

Prepares this job to be continued

This method should be run on an existing job, i.e. one that has a matching Process in the ProjectGraph. This means self.output_dir, self.input_nodes, and self.output_nodes should be assigned.

Updates the parameters and values of each JobOption using continuation_joboption_updates()

Raises:: NotImplementedError – If the job cannot be continued

prepare_onedep_deposition_data() → List[OneDepDepositionObject]

This function makes the deposition objects specific to this job

This is the function that should be edited for each specific PipelinerJob The main em_image_processing, citations, and citation_author objects will be created automatically, and don’t need to be included in this function.

Returns:

All the deposition objects, except software and: citations

Return type:

List[DepositionObject]

prepare_to_run(ignore_invalid_joboptions: bool = False, existing_nodes: Dict[str, Node] | None = None) → None

Prepare the job to run.

This function is intended to be called by the pipeliner before the job file is saved to disk. It does several things including: - Validate the job options - Make the job directory - Move uploaded Doppio user files into the job directory

Parameters:

ignore_invalid_joboptions (bool) – Prepare the job to run anyway even if the job options appear to be invalid
existing_nodes – A dict of {node_name: Node object} for the current nodes in the pipeline, if available. This allows jobs to re-use existing nodes rather than creating new ones, which can be necessary if the type of the node is uncertain.

Raises:

ValueError – If the job options appear to be invalid and ignore_invalid_joboptions is not set
RuntimeError – If the job does not already have an output directory assigned

read(filename: str) → None

Reads parameters from a run.job or job.star file

Parameters:: filename (str) – The file to read. Can be a run.job or job.star file
Raises:: ValueError – If the file is a job.star file and job option from the PipelinerJob is missing from the input file

relion_joboption_conversion() → None

Any updates that need to be made to the JobOptions if it was converted from relion

This function should modify job options and not return anything. It may need to be updated at a later time to accept a list of raw options read from the file if it is needed to convert jobs where relion has JobOptions that are not present in the pipeliner version of the job

save_job_submission_script(commands: list) → str

Writes a submission script for jobs submitted to a queue

Parameters:

commands (list) – The commands to save, in a list of lists format. In Relion this would be the actual job commands, but in the pipeliner it will just be a single command to run the job with the job_runner.

Returns:

The name of the submission script that was written

Return type:

str

Raises:

ValueError – If no submission script template was specified in the job’s joboptions
ValueError – If the submission script template is not found
RuntimeError – If the output script could not be written

save_results_display_files() → Sequence[ResultsDisplayObject]

Create new results display objects and save them to disk.

This method removes any existing results display files first, and returns the new display objects after they have been created and saved.

Returns:: The newly-created results display objects.

set_joboption_order(new_order: List[str]) → None

Replace the joboptions dict with an ordered dict

Use this to set the order the joboptions will appear in the GUI. If a joboption is not specified in the list it will be tagged on to the end of the list.

Parameters:: new_order (list[str]) – A list of joboption keys, in the order they should appear
Raises:: ValueError – If a nonexistent joboption is specified

set_option(line: str) → None

Sets a value in the joboptions dict from a run.job file

Parameters:

line (str) – A line from a run.job file

Raises:

RuntimeError – If the line does not contain ‘==’
RuntimeError – If the value of the line does not match any of the joboptions keys

validate_dynamically_required_joboptions() → List[JobOptionValidationResult]

Check all joboptions if they have become required because of if_required

For example if job option A is True, job option B is now required

Returns:

pipeliner.job_options.JobOptionValidationResult:: for any errors found

Return type:

list

validate_input_files() → List[JobOptionValidationResult]

Check that files specified as inputs actually exist

Returns:

A list of pipeliner.job_options.JobOptionValidationResult: objects

Return type:

list

validate_joboptions() → List[JobOptionValidationResult]

Make sure all the joboptions meet their validation criteria

Returns:

A list JobOptionValidationResult: objects

Return type:

list

validate_runtab_joboptions() → List[JobOptionValidationResult]

write_jobstar(output_dir: str, output_fn: str = 'job.star', is_continue: bool = False)

Write a job.star file.

Parameters:

output_dir (str) – The output directory.
output_fn (str) – The name of the file to write. Defaults to job.star
is_continue (bool) – Is the file for a continuation of a previously run job? If so only the parameters that can be changed on continuation are written. Overrules is_continue attribute of the job

write_runjob(fn: str | None = None) → None

Writes a run.job file

Parameters:: fn (str) – The name of the file to write. Defaults to the file the pipeliner uses for storing GUI parameters. A directory can also be entered and it will add on the file name ‘run.job’

class pipeliner.pipeliner_job.Ref(authors: str | List[str] | None = None, title: str = '', journal: str = '', year: str = '', volume: str = '', issue: str = '', pages: str = '', doi: str = '', **kwargs)

Bases: object

Class to hold metadata about a citation or reference, typically a journal article.

authors

The authors of the reference.

Type:: list

title

The reference’s title.

Type:: str

journal

The journal.

Type:: str

year

The year of publication.

Type:: str

volume

The volume number.

Type:: str

issue

The issue number.

Type:: str

pages

The page numbers.

Type:: str

doi

The reference’s Digital Object Identifier.

Type:: str

other_metadata

Other metadata as needed. Gathered from kwargs

Type:: dict

Display tools

Use these methods to create ResultsDisplayObject used by the pipeliner GUI Doppio to create graphical outputs for each job.

pipeliner.display_tools.create_results_display_object(dobj_type: str, **kwargs) → ResultsDisplayObject

Safely create a results display object

Returns a ResultsDisplayPending if there are any problems. Give it the type of display object as the first argument followed by the kwargs for that specific type of ResultsDisplayObject

Parameters:: dobj_type (str) – The type of DisplayObject to create

pipeliner.display_tools.get_ordered_classes_arrays(model_file: str, ncols: int, boxsize: int, output_dir: str, output_filename: str, parts_file: str | None = None, title: str = '2D class averages', start_collapsed: bool = False, flag: str = '', base64_output: bool = False, optimiser_info: dict | None = None) → ResultsDisplayObject

Return a 3D array of class averages from a Relion Class2D model file

Parameters:

model_file (str) – Name of the model file
ncols (int) – number of columns desired in the file montage
boxsize (int) – Size of the class averages in the final montage
output_dir (str) – The output dir of the pipeliner job creating this object
output_filename (str) – The name for the output montage file
parts_file (str) – Path of the file containing the particles, for counting
title (str) – A title for the DisplayObject
start_collapsed (bool) – Should the display start out collapsed when displayed in the GUI
flag (str) – If this display object contains scientificlly dubious results display this message
base64_output (bool) – flag for a JSON file output with a list of objects holding base64 images and class IDs
optimiser_info (dict) – {name: str, type: str} Specify an optimiser data node to be included in the associated nodes of a gallery results display object.

Returns:

An object for the GUI to use to render the graph

Return type:

ResultsDisplayMontage

pipeliner.display_tools.graph_from_starfile_cols(title: str, starfile: str, block: str, ycols: list, xcols: list | None = None, xrange: list | None = None, yrange: list | None = None, data_series_labels: List[str] | None = None, xlabel: str = '', ylabel: str = '', assoc_data: List[str] | None = None, modes: List[str] | None = None, start_collapsed: bool = False, flag: str = '') → ResultsDisplayGraph | ResultsDisplayPending

Automatically generate a ResultsDisplayGraph object from a STAR file

Can use one or two columns and third column for labels if desired

Parameters:

title (str) – The title of the final graph
starfile (str) – Path to the STAR file to use
block (str) – The block to use in the STAR file, use None for a STAR file with only a single block
ycols (list) – Column label(s) from the STAR file to use for the y data series
xcols (list) – Column label(s) from the STAR file to use for the y data series if None a simple count from 1 will be used
xlabel (str) – Label for the x axis, if no x data are specified the label will ‘Count’, if x data are specified and the xlabel is None the x axis label will be the name of the STAR file column used
xrange (list) – Range for x vlaues to be displayed, full range if None
yrange (list) – Range for y vlaues to be displayed, full range if None
data_series_labels (list) – Names for the data series
ylabel (str) – Label for the y axis, if None the y axis label will be the name of the STAR file column used
assoc_data (list) – List of data file(s) associated with this graph
modes (list) – Controls the appearance of each data series, choose from ‘lines’, ‘markers’ ‘or lines+markers’
start_collapsed (bool) – Should the display start out collapsed when displayed in the GUI
flag (str) – If this display object contains scientificlly dubious results display this message

Returns:

A ResultsDisplayGraph object for the created graph

Return type:

ResultsDisplayGraph

pipeliner.display_tools.histogram_from_starfile_col(title: str, starfile: str, block: str, data_col: str, xlabel: str = '', ylabel: str = 'Count', assoc_data: List[str] | None = None, start_collapsed: bool = False, flag: str = '') → ResultsDisplayHistogram | ResultsDisplayPending

Automatically generate a ResultsDisplayHistogram object from a STAR file

Parameters:

title (str) – The title of the final graph
starfile (str) – Path to the STAR file to use
block (str) – The block to use in the STAR file, use None for a STAR file with only a single block
data_col (str) – Column label from the STAR file to use for the data series
xlabel (str) – Label for the x axis, if no x data are specified the label will ‘Count’, if x data are specified and the xlabel is None the x axis label will be the name of the STAR file column used
ylabel (str) – Label for the y axis, if None the y axis label will be the name of the STAR file column used
assoc_data (list) – List of data file(s) associated with this graph
start_collapsed (bool) – Should the display start out collapsed when displayed in the GUI
flag (str) – If this display object contains scientifically dubious results display this message

pipeliner.display_tools.make_image_carousel(starfile: str, block: str, column: str, title: str = '', nimg: int = 500, start_collapsed: bool = False, flag: str = '') → ResultsDisplayImageCarousel | ResultsDisplayPending

Make a new image carousel display object from a STAR file.

Parameters:

starfile (str) – The STAR file to use
block (str) – The name of the block with the images. If the STAR file contains only a single block, a blank string can be given here.
column (str) – The name of the column that has the images
title (str) – The title for the object, automatically generated if “”
nimg (int) – Number of images to use in the carousel, default 500, or all if < 0. Beware that using all images could lead to enormous results display files if the data set is large, so you should usually leave this at its default.
start_collapsed (bool) – Should the display start out collapsed when displayed in the GUI
flag (str) – If this display object contains scientificlly dubious results display this message

Returns:

The DisplayObject for displaying STAR file images in the interactive carousel.

Return type:

ResultsDisplayImageCarousel

Raises:

ValueError – If no images are listed in the STAR file.

pipeliner.display_tools.make_map_model_thumb_and_display(outputdir: str, maps: List[str] | None = None, maps_opacity: List[float] | None = None, maps_colours: List[str] | None = None, models: List[str] | None = None, models_colours: List[str] | None = None, bild_files: List[str] | None = None, title: str | None = None, maps_data: str = '', models_data: str = '', assoc_data: List | None = None, flag: str = '', start_collapsed: bool | None = True) → ResultsDisplayMapModel | ResultsDisplayPending

Make a display object for an atomic model overlaid over a map

Makes a binned map and a ResultsDisplayMapModel display object

Parameters:

outputdir (str) – Name of the job’s output directory
maps (list) – List of map files to use
models (list) – List of model files to use
maps_opacity (list) – List of opacity for the maps, from 0-1 if None 0.5 is used for all
maps_colours (list) – Colors for the maps of specific ones are desired, otherwise mol* will assign them
title (str) – The title for the ResultsDisplayMapModel object, if None the name of the map and model will be used
maps_data (str) – Any additional data to be included about the map
models_data (str) – Any additional data to be included about the map
models_colours (list) – Colors for the models of specific ones are desired, otherwise mol* will assign them
assoc_data (list) – List of associated data, if left as None then just uses the file itself
flag (str) – If the results are considered scientifically dubious explain in this string
start_collapsed (bool) – Should the display start out collapsed when displayed

Returns:

The DisplayObject for the map and model

Return type:

ResultsDisplayMapModel

pipeliner.display_tools.make_maps_slice_montage_and_3d_display(in_maps: Dict[str, str], output_dir: str, combine_montages: bool = True, cmap: str = '', base64_output: bool = False, optimiser_info: dict | None = None, bild_files: List[str] | None = None) → List[ResultsDisplayObject]

Make a set of display objects for 3D maps

Returns separate 3D viewer display objects for each map and either a combined slices montage or a slices montage for each.

Parameters:

in_maps (dict) – {input file: label}. If the label is “”, the filename will be used
output_dir (str) – The job’s output dir where the thumbnails dir will be created if necessary
combine_montages (bool) – Should a single montage be made with slices for all maps, otherwise a separate montage is made for each
cmap (str) – what color map to use for the montage, if any.
base64_output (bool) – Whether to create base64 thumbnails and gallery display object.
optimiser_info (dict) – {name: str, type: str} Specify an optimiser data node to be included in the associated nodes of a gallery results display object.

Returns:

The display objects montage and then the 3D viewers if combine_montages: is False, otherwise the montage followed by the 3D viewer for each map in the order they were given.

Return type:

List

pipeliner.display_tools.make_mollweide_angular_distribution(starfile_path: str, output_dir: str, title: str = 'Angular distribution of particles', block: str = 'particles', n_phi_bins: int = 72, n_theta_bins: int = 36) → ResultsDisplayImage | ResultsDisplayPending

Create a Mollweide projection heatmap of particle orientations

Reads _rlnAngleRot (phi) and _rlnAngleTilt (theta) from a RELION STAR file and produces a 2-D histogram on a Mollweide projection, saved as a PNG. For C1 symmetry, the plot will show the full distribution of particle orientations. For higher symmetries the plot might occupy a subsection of the full projection, depending on the symmetry axes, e.g. quarter of the plot for D2.

Parameters:

starfile_path (str) – Path to a STAR file containing particle data with _rlnAngleRot and _rlnAngleTilt columns
output_dir (str) – Job output directory (PNG is saved under Thumbnails/)
title (str) – Title for the plot and display object
block (str) – Block name in the STAR file
n_phi_bins (int) – Number of bins in the azimuthal direction
n_theta_bins (int) – Number of bins in the polar direction

Returns:

ResultsDisplayImage or ResultsDisplayPending on error

pipeliner.display_tools.make_moorhen_display(maps: List[str] | None = None, models: List[str] | None = None, title: str | None = None, maps_data: str = '', models_data: str = '', assoc_data: List | None = None, flag: str = '', session_file: str | None = None) → ResultsDisplayMapModel | ResultsDisplayPending

Make a Moorhen display object for maps and/or models

Creates a ResultsDisplayMoorhen display object. If no title is provided, one is generated from the map and model file names.

Parameters:

maps (list) – List of map files to use
models (list) – List of model files to use
title (str) – The title for the display object. If None, a title is generated from the map and model file names.
maps_data (str) – Any additional data to be included about the maps. If empty, defaults to a comma-separated list of map paths.
models_data (str) – Any additional data to be included about the models. If empty, defaults to a comma-separated list of model paths.
assoc_data (list) – List of associated data. If None, defaults to the combined list of maps and models.
flag (str) – If the results are considered scientifically dubious explain in this string
session_file (str) – Path to a Moorhen session file

Returns:

The Moorhen DisplayObject for the maps and models, or a ResultsDisplayPending if an error occurs

Return type:

ResultsDisplayMoorhen

pipeliner.display_tools.make_mrcs_central_slices_montage(in_files: Dict[str, str], output_dir: str, cmap: str = '', base64_output: bool = False, optimiser_info: dict | None = None) → ResultsDisplayObject

Make a montage of x,y,z central slices of maps

Parameters:

in_files (Dict[str, str]) – {file name: label if different from file name}
output_dir (str) – Where to make the Thumbnails dir (if necessary) and put the montage image
cmap (str) – What colormap to use, if any
optimiser_info (dict) – {name: str, type: str} Specify an optimiser data node to be included in the associated nodes of a gallery results display object.
Returns
ResultsDisplayMontage – The montage ResultsDisplayObject

pipeliner.display_tools.make_particle_coords_thumb(in_mrc, in_coords, out_dir, thumb_size=640, pad=5, start_collapsed=False, title: str = 'Example picked particles', flag: str = '', markers: bool = False) → ResultsDisplayImage | ResultsDisplayPending

Create a thumbnail of picked particle coords on their micrograph

Because the extraction box size is not known boxes will be a % of the total image size.

Parameters:

in_mrc (str) – Path to the merged micrograph mrc file
in_coords (str) – Path to the .star coordinates file
out_dir (str) – Name of the output directory
thumb_size (int) – Size of the x dimension of the final thumbnail image
pad (int) – Thickness of the particle box borders before binning in px
start_collapsed (bool) – Should the display start out collapsed when displayed in the GUI
title (str) – What title to use for the displayobj created
flag (str) – If this display object contains scientificlly dubious results display this message
markers (bool) – Instead of making boxes make markers

pipeliner.display_tools.mini_montage_from_many_files(filelist: List[str], outputdir: str, nimg: int = 5, montagesize: int = 640, title: str = '', ncols: int = 5, associated_data: List[str] | None = None, labels: List[str] | None = None, cmap: str = '', start_collapsed: bool = False, flag: str = '') → ResultsDisplayMontage | ResultsDisplayPending

Make a mini montage from a list of images

Merge and flatten image stacks

Parameters:

filelist (list) – A list of the files to use
outputdir (str) – The output dir of the pipeliner job
nimg (int) – Number of images to use in the montage
montagesize (int) – Desired size of the final montage image
title (str) – Title for the ResultsDisplay object that will be output
ncols (int) – Number of columns to make in the montage
associated_data (list) – Data files associated with these images, if None then all of the selected images
labels (list) – The labels for the items in the montage
cmap (str) – colormap to apply, if any
start_collapsed (bool) – Should the display start out collapsed when displayed in the GUI
flag (str) – If this display object contains scientificlly dubious results display this message

Returns:

The DisplayObject for the map

Return type:

ResultsDisplayImage

Raises:

ValueError – If a non mrc or tiff image is used

pipeliner.display_tools.mini_montage_from_stack(stack_file: str, outputdir: str, nimg: int = 40, ncols: int = 10, montagesize: int = 640, title: str = '', labels: List[str | int] | None = None, cmap: str = '', start_collapsed: bool = False, flag: str = '') → ResultsDisplayMontage | ResultsDisplayPending

Make a montage from a mrcs or tiff file

Parameters:

stack_file (str) – The path to the stack_file
outputdir (str) – The output dir of the pipeliner job
nimg (int) – Number of images to use in the montage, if < 1 uses all of them
ncols (int) – Number of columns to use
montagesize (int) – Desired size of the final montage image
title (str) – Title for the ResultsDisplay object that will be output
labels (list) – Labels for the images
cmap (str) – colormap to apply, if any
start_collapsed (bool) – Should the display start out collapsed when displayed in the GUI
flag (str) – If this display object contains scientificlly dubious results display this message

Returns:

The DisplayObject for the map

Return type:

ResultsDisplayImage

Raises:

ValueError – If a non mrc or tiff image is used

pipeliner.display_tools.mini_montage_from_starfile(starfile: str, block: str, column: str, outputdir: str, title: str = '', nimg: int = 20, montagesize: int = 640, ncols: int = 10, labels: List[str] | None = None, cmap: str = '', start_collapsed: bool = False, flag: str = '') → ResultsDisplayMontage | ResultsDisplayPending

Make a montage from a list of images in a STAR file column

Merge and flatten image stacks if they are encountered.

Parameters:

starfile (str) – The STAR file to use
block (str) – The name of the block with the images
column (str) – The name of the column that has the images
outputdir (str) – The output dir of the pipeliner job
title (str) – The title for the object, automatically generated if “”
nimg (int) – Number of images to use in the montage, uses all if < 0
montagesize (int) – Desired size of the final montage image
ncols (int) – number of columns to use
labels (list) – Labels for the images in the montage, in order
cmap (str) – colormap to apply, if any
start_collapsed (bool) – Should the display start out collapsed when displayed in the GUI
flag (str) – If this display object contains scientificlly dubious results display this message

Returns:

The DisplayObject for the map

Return type:

ResultsDisplayImage

Raises:

ValueError – If a non mrc or tiff image is encountered

ResultsDisplay Objects

These objects generally should not be instantiated directly they should instead be created using the functions above.

class pipeliner.results_display_objects.ResultsDisplayGallery(*, title: str, images: str, labels: List[str] | None = None, associated_nodes: List[Dict[str, str]], associated_data: List[str], start_collapsed: bool = False, flag: str = '')

Bases: ResultsDisplayObject

Display object for Doppio’s interactive image gallery

It is best to not instantiate this class directly. Instead create it using create_results_display_object

title

The title of the object/graph

Type:: str

images

Path to a .json file containing a list of image objects (base64 images and class IDs)

Type:: str

labels

Data labels for the images

Type:: list

associated_nodes

{name: str, type: str} A list of nodes associated with this gallery along with their full node types

Type:: list[dict]

associated_data

A list of files that contributed the data used in the image/graph

Type:: list

start_collapsed

Should the object start out collapsed when displayed in the GUI

Type:: bool

class pipeliner.results_display_objects.ResultsDisplayGraph(*, xvalues: List[List[float | int]], yvalues: List[List[float | int]], title: str, associated_data: List[str], data_series_labels: List[str], xaxis_label: str = '', xrange: List[float] | None = None, yaxis_label: str = '', yrange: List[float] | None = None, modes: List[Literal['lines', 'markers', 'lines+markers']] | None = None, start_collapsed: bool = False, flag: str = '')

Bases: ResultsDisplayObject

A simple graph for the GUI to display

It is best to not instantiate this class directly. Instead create it using create_results_display_object

title

The title of the object/graph

Type:: str

xvalues: (list): list of x coordinate data series, can have multiple data series

xaxis_label

Label for the x axis if a graph

Type:: str

xrange

Range of x to be displayed, displays the full range if None. If the x axis needs to be reveresd then enter the values backwards [max, min]

Type:: list

yvalues: (list): List y coordinate data series can have multiple data series

yaxis_label

Label for the y axis if a graph

Type:: str

yrange

Range of y to be displayed, displays the full range if None. If the y axis needs to be reveresd then enter the values backwards [max, min]

Type:: list

data_series_labels

List of names of the different data series

Type:: list

associated_data

A list of files that contributed the data used in the image/graph

Type:: list

modes

Controls the appearance of each data series, choose from ‘lines’, ‘markers’ ‘or lines+markers’

Type:: list

start_collapsed

Should the object start out collapsed when displayed in the GUI

Type:: bool

class pipeliner.results_display_objects.ResultsDisplayHistogram(*, title: str, associated_data: List[str], data_to_bin: List[float] | None = None, xlabel: str = '', ylabel: str = '', bins: List[int] | None = None, bin_edges: List[float] | None = None, start_collapsed: bool = False, flag: str = '')

Bases: ResultsDisplayObject

A class for the GUI to display a histogram

It is best to not instantiate this class directly. Instead, create it using create_results_display_object

Parameters:

title (str) – The title of the histogram
data_to_bin (list) – The data to bin
xlabel (str) – Label for the x axis
ylabel (str) – Label for the y axis
associated_data (list) – List of data files associated with the histogram
bins (list) – A list of bin counts, if they are known
bin_edges (list) – A list of the bin edges, if they are already known
start_collapsed (bool) – Should the object start out collapsed when displayed in the GUI

Raises:

ValueError – If no data or bins are specified
ValueError – If an attempt is made to specify bins or bin edges when data to bin are being provided
ValueError – If the associated data is not a list, or not provided

class pipeliner.results_display_objects.ResultsDisplayHtml(*, title: str, associated_data: List[str], html_dir: str = '', html_file: str = '', html_str: str = '', start_collapsed: bool = False, flag: str = '')

Bases: ResultsDisplayObject

An object for the GUI to display html

It is best to not instantiate this class directly. Instead create it using create_results_display_object

This can be used for general HTML display in Doppio. Either provide a directory with index.html or specify a html file or provide a html string as input.

html_dir

Path to the html directory (optional)

Type:: str

html_file

Path to a standalone html file or in the given html_dir (optional)

Type:: str

html_str

Input html as string (optional)

Type:: str

start_collapsed

Should the object start out collapsed when displayed in the GUI

Type:: bool

class pipeliner.results_display_objects.ResultsDisplayImage(*, title: str, image_path: str, image_desc: str, associated_data: List[str], start_collapsed: bool = False, flag: str = '')

Bases: ResultsDisplayObject

A class for the GUI to display a single image

It is best to not instantiate this class directly. Instead create it using create_results_display_object

title

The title for the image

Type:: str

image_path

The path to the image

Type:: str

image_desc

A description of the image

Type:: str

associated_data

Data files associated with the image

Type:: list

start_collapsed

Should the object start out collapsed when displayed in the GUI

Type:: bool

class pipeliner.results_display_objects.ResultsDisplayImageCarousel(*, title: str, image_names: List[str], image_index_file: str, associated_data: List[str] | None = None, start_collapsed: bool = False, flag: str = '')

Bases: ResultsDisplayObject

A class for Doppio’s interactive image carousel

It is best to not instantiate this class directly. Instead create it using create_results_display_object or display_tools.make_image_carousel

title

The title for display

Type:: str

image_names

The paths to the images

Type:: list[str]

image_index_file

Source STAR file

Type:: str

associated_data

Data files associated with the images

Type:: list[str]

start_collapsed

Should the object start out collapsed when displayed in the GUI

Type:: bool

class pipeliner.results_display_objects.ResultsDisplayJson(*, file_path: str, title: str = '', start_collapsed: bool = False, flag: str = '')

Bases: ResultsDisplayObject

An object for the GUI to display JSON files

It is best to not instantiate this class directly. Instead create it using create_results_display_object

file_path

Path to the file

Type:: str

start_collapsed

Should the object start out collapsed when displayed in the GUI

Type:: bool

class pipeliner.results_display_objects.ResultsDisplayMapModel(title: str, associated_data: List[str], maps: List[str] | None = None, models: List[str] | None = None, maps_data: str = '', models_data: str = '', maps_opacity: List[float] | None = None, maps_colours: List[str] | None = None, models_colours: List[str] | None = None, start_collapsed: bool = True, bild_files: List[str] | None = None, flag: str = '')

Bases: ResultsDisplayObject

An object for overlaying multiple maps and/or models

It is best to not instantiate this class directly. Instead create it using create_results_display_object

title

The title that appears at the top of the accordian in the GUI

Type:: str

associated_data

A list of associated data files

Type:: list

maps

List of map paths, mrc format

Type:: list

models

List of model paths, pdb or mmcif format

Type:: list

maps_opacity

Opacity for each map from 0-1 if not specified set at 0.5 for all maps

Type:: list

models_data

Any extra info about the models

Type:: str

maps_data

Any extra info about the maps

Type:: str

maps_colours

Hex values for colouring the maps specific colours, in the form “#XXXXXX” where X is a hex digit (0-9 or a-f). If None, the standard colours will be used

Type:: list

models_colours

Hex values for colouring the models specific colours, in the form “#XXXXXX” where X is a hex digit (0-9 or a-f). If None, the standard colours will be used

Type:: list

bild_files

Optional list of .bild file paths to overlay in the 3D viewer (e.g. angular distribution plots)

Type:: list

Raises:

ValueError – If no maps or models were specified
ValueError – If the map is not .mrc format
ValueError – If models are not in pdb of mmcif format
ValueError – If the number of maps and map opacities don’t match

class pipeliner.results_display_objects.ResultsDisplayMontage(*, xvalues: List[int], yvalues: List[int], img: str, title: str, associated_data: List[str], labels: List[str] | None = None, start_collapsed: bool = False, flag: str = '')

Bases: ResultsDisplayObject

An object to send to the GUI to make an image montage

This one is an image montage with info about the specific images It is best to not instantiate this class directly. Instead create it using create_results_display_object

title

The title of the object/graph

Type:: str

xvalues: (list): The x coordinates by image

yvalues: (list): The y coordinates by image

labels

Data labels for the images

Type:: list

associated_data

A list of files that contributed the data used in the image/graph

Type:: list

img

Path to an image to display

Type:: str

start_collapsed

Should the object start out collapsed when displayed in the GUI

Type:: bool

class pipeliner.results_display_objects.ResultsDisplayMoorhen(title: str, associated_data: List[str], maps: List[str] | None = None, models: List[str] | None = None, session_file: str | None = None, maps_data: str = '', models_data: str = '', start_collapsed: bool = True, flag: str = '')

Bases: ResultsDisplayObject

An object for displaying Maps, Models, and Moorhen job sessions.

It is best to not instantiate this class directly. Instead create it using create_results_display_object

title

The title that appears at the top of the accordian in the GUI.

Type:: str

associated_data

A list of associated data files

Type:: list

maps

List of map paths in .mrc, .map, .ccp4 or .mtz format

Type:: list

models

List of model paths in .pdb, .cif, .mmcif, .pdbx or .ent format

Type:: list

session_file

Path to a Moorhen session file. This contains the state of the Moorhen session, if provided the expected maps and models should match that of the session file. If not provided, a warning is appended to the flag.

Type:: str

maps_data

Any extra info about the maps

Type:: str

models_data

Any extra info about the models

Type:: str

start_collapsed

Whether the accordion starts collapsed in the GUI

Type:: bool

flag

Flag string for additional status information

Type:: str

Raises:

ValueError – If a map is not in .mrc, .map, .ccp4 or .mtz format
ValueError – If a model is not in .pdb, .cif, .mmcif, .pdbx or .ent format

class pipeliner.results_display_objects.ResultsDisplayObject(title: str, start_collapsed: bool = False, flag='')

Bases: object

Abstract super-class for results display objects

title

The title

Type:: str

start_collapsed

Should the object start out collapsed when displayed in the GUI

Type:: bool

dobj_type

Used to identify what kind of ResultsDisplayObject it is

Type:: str

flag

A message that is displayed if the results display object is showing somthing scientifically dubious.

Type:: str

write_displayobj_file(outdir) → None

Write a json file from a ResultsDisplayObject object

Parameters:: outdir (str) – The directory to write the output in
Raises:: NotImplementedError – If a write attempt is made from the superclass

class pipeliner.results_display_objects.ResultsDisplayPdfFile(*, file_path: str, title: str = '', start_collapsed: bool = False, flag: str = '')

Bases: ResultsDisplayObject

An object for the GUI to display pdf files

It is best to not instantiate this class directly. Instead create it using create_results_display_object

file_path

Path to the file

Type:: str

start_collapsed

Should the object start out collapsed when displayed in the GUI

Type:: bool

class pipeliner.results_display_objects.ResultsDisplayPending(*, title: str = 'Results pending...', message: str = 'The result not available yet', reason: str = 'unknown', start_collapsed: bool = False, flag: str = '')

Bases: ResultsDisplayObject

A placeholder class for when a job is not able to produce results yet

class pipeliner.results_display_objects.ResultsDisplayPlotlyFigure(title: str, plotlyfig: str, associated_data: List[str] | None = None, start_collapsed: bool = False)

Bases: ResultsDisplayObject

This class displays an existing Plotly Figure object.

Call fig.to_json() on your Figure and then pass the JSON string to the plotlyfig argument when creating this object.

write_displayobj_file(outdir)

Write a json file from a ResultsDisplayObject object

Parameters:: outdir (str) – The directory to write the output in
Raises:: NotImplementedError – If a write attempt is made from the superclass

Bases: ResultsDisplayObject

A class that generates plotly.graph_objects.Figure object to display a histogram Uses plotly express histogram https://plotly.com/python-api-reference/generated/plotly.express.histogram.html Examples here: https://plotly.com/python/histograms/

data: The data to bin. Following types are allowed list - list of values to be binned array and dict - converted to a pandas dataframe internally pandas dataframe - ensure column names are added if ‘x’ indicates a column name More details https://plotly.com/python/px-arguments/

title

The title of the plot

Type:: str

plotlyfig

plotly.graph_objects.Figure object generated from input data

Type:: plotly.graph_objects.Figure

associated_data

A list of the associated data files

Type:: list

start_collapsed

Should the object start out collapsed when displayed in the GUI

Type:: bool

Bases: ResultsDisplayObject

This uses the plotly express class to create plotly.graph_objects.Figure object https://plotly.com/python/plotly-express/ Use this class to generate plotly Figure objects for custom plots including facet-plots: https://plotly.com/python/facet-plots/ subplots: https://plotly.com/python/subplots/ multi_series: e.g. https://plotly.com/python/creating-and-updating-figures/#adding-traces

data: The data to plot. For a single plot, following types are allowed: list - list of values to be binned array and dict - converted to a pandas dataframe internally pandas dataframe - ensure column names are added if ‘x’ indicates a column name More details https://plotly.com/python/px-arguments/ For subplots and/or multi_series: list - list with dictionary of arguments for each plot/series

plot_type

Required, type of plot. For a single plot, it is the plotly express function to call https://plotly.com/python-api-reference/plotly.express.html For subplots and/or multi_series, plotly.graph_objects function to call https://plotly.com/python-api-reference/plotly.graph_objects.html

Type:: str

title

The title of the plot

Type:: str

plotlyfig

plotly.graph_objects.Figure object generated from input data

Type:: plotly.graph_objects.Figure

associated_data

A list of the associated data files

Type:: list

start_collapsed

Should the object start out collapsed when displayed in the GUI

Type:: bool

check_multiseries_arguments(data, plot_type, series_args) → None

check_plottype_list(plot_type, data) → None

check_singleplot_arguments(plot_type) → None

check_subplot_arguments(data, subplot_size, subplot_order, plot_type, subplot_args, xaxes_args, yaxes_args) → None

generate_multiseries_plots(plot_type, plot_args) → Figure

generate_subplots(subplot_size, plot_type, subplot_order, plot_args, make_subplot_args) → Figure

set_multiplot_data(data) → None

set_singleplot_data(data) → None

Bases: ResultsDisplayObject

A class that generates plotly.graph_objects.Figure object to display a scatter plot Uses plotly express scatter https://plotly.com/python-api-reference/generated/plotly.express.scatter.html Examples here: https://plotly.com/python/line-and-scatter/

data: The data to bin. Following types are allowed list - list of values to be binned array and dict - converted to a pandas dataframe internally pandas dataframe - ensure column names are added if ‘x’ indicates a column name More details https://plotly.com/python/px-arguments/

title

The title of the plot

Type:: str

plotlyfig

plotly.graph_objects.Figure object generated from input data

Type:: plotly.graph_objects.Figure

associated_data

A list of the associated data files

Type:: list

start_collapsed

Should the object start out collapsed when displayed in the GUI

Type:: bool

class pipeliner.results_display_objects.ResultsDisplayRvapi(*, title: str, rvapi_dir: str, start_collapsed: bool = False, flag: str = '')

Bases: ResultsDisplayObject

An object for the GUI to display rvapi objects

It is best to not instantiate this class directly. Instead create it using create_results_display_object

This can be used for general HTML display in Doppio. Create a directory with index.html and it will be shown in the results display tab

rvapi_dir

Path to the rvapi directory

Type:: str

start_collapsed

Should the object start out collapsed when displayed in the GUI

Type:: bool

class pipeliner.results_display_objects.ResultsDisplayTable(*, title: str, headers: List[str], table_data: List[List[str]], associated_data: List[str], header_tooltips: List[str] | None = None, start_collapsed: bool = False, flag: str = '')

Bases: ResultsDisplayObject

An object for the GUI to display a table

It is best to not instantiate this class directly. Instead create it using create_results_display_object

title

The title of the table

Type:: str

headers

The column headers for the table

Type:: list

table_data

A list of lists, on per row

Type:: list

associated_data

A list of the associated data files

Type:: list

header_tooltips

Tooltips for each column. Column header by default

Type:: list

start_collapsed

Should the object start out collapsed when displayed in the GUI

Type:: bool

class pipeliner.results_display_objects.ResultsDisplayText(*, title: str, display_data: str, associated_data: List[str] | None = None, start_collapsed: bool = False, flag: str = '')

Bases: ResultsDisplayObject

A class to display general text in the GUI results tab

It is best to not instantiate this class directly. Instead create it using create_results_display_object

title

the title of the section

Type:: str

display_data

The text to display

Type:: str

associated_data

Data files associated with this result

Type:: list

start_collapsed

Should the object start out collapsed when displayed in the GUI

Type:: bool

class pipeliner.results_display_objects.ResultsDisplayTextFile(*, file_path: str, title: str = '', start_collapsed: bool = False, flag: str = '')

Bases: ResultsDisplayObject

An object for the GUI to display ascii tecxt files

It is best to not instantiate this class directly. Instead create it using create_results_display_object

This can be used for default display of files that have ascii encoded text but the formats are too variable to make a more complex ResultsDisplayFile

file_path

Path to the file

Type:: str

start_collapsed

Should the object start out collapsed when displayed in the GUI

Type:: bool

pipeliner.results_display_objects.get_next_resultsfile_name(dir: str, search_str: str) → str

Get the name of the next results file

taking into account existing files of this type in the output dir to prevent overwriting existing ones

Parameters:

dir (str) – The output directory
search_str (str) – The full name for the file with * in place of the number

Returns:

The name of the file

Return type:

str

External program utilities

class pipeliner.external_program_utils.ExternalProgram(command: str, name: str | None = None, vers_com: List[str] | None = None, onedep_categories: Dict[str, str] | None = None, onedep_name: str = '', reference_doi: str = '', vers_regex: Pattern[str] | None = None)

Bases: object

Class to store info about external programs called by the pipeliner

command

The command that will be used to run the program

Type:: str

name

The name for the program, command will be used unless this is specified. This is the name pipeliner looks for when checking a program’s version, so it shoudl be identical to what is printed when the vers com is run

Type:: str

exe_path

The path to the executable for the program

Type:: str

vers_com

The command that needs to be run to get the version

Type:: List[str]

onedep_categories

How the software should be classified in the EMDB

Type:: List[str]

doi

A DOI for the reference for the program. If the same as the first reference for the job, it can be left blank.

Type:: str

onedep_name

The name to use for the software in ondep if it is different from the actual name of the program

Type:: str

vers_regex

A complied regular expression used to find the version of the program from the output of the vers_com. It must contain a capture group that captures the version number.

Type:: Pattern[AnyStr]

get_version(timeout: float = 3.0) → str | None

get_version_from_logs(job_name: str) → str | None

Get the version of the program used at runtime from the log files

Parameters:: job_name (str) – The job the program was run in
Returns:: The program version or None
Return type:: Optional[str]

pipeliner.external_program_utils.make_version_regex(pname: str) → Pattern[str]

Writes the regex for finding a program’s version in the log files

Parameters:: pname (str) – the name of the program to find. This must be the name it uses in ExternalProgram.name

Deposition Objects

Lists of DepositionObject are returned by a PipelinerJob. The prepare_onedep_deposition_data and prepare_empiar_deposition_data functions are used to prepare automated depositions to the EMDB, and EMPIAR.

class pipeliner.deposition_tools.deposition_tools.DepositionObject(category: str, parent_job: str | None = None, ancestor_jobs: List[str] | None = None): Bases: object

pipeliner.deposition_tools.deposition_tools.get_ebi_parser(use_cached: bool) → DictionaryParser

Get the ebi dict parser

Parameters:: use_cached – whether to use cached dictionary or download the latest one
Returns:: The ebi dict parser
Return type:: DictionaryParser

Functions that support these methods are:

EMPIAR DepositionObjects

Bases: DepositionObject

This object contains data that will be used in an EMPIAR deposition Attributes:

class pipeliner.deposition_tools.empiar_deposition_objects.Micrograph(file: str, ext: str, n_frames: int, dimx: int, dimy: int, dtype: str, headtype: str, apix: float, voltage: float, spherical_aberration: float)

Bases: object

apix: float

dimx: int

dimy: int

dtype: str

ext: str

file: str

headtype: str

n_frames: int

spherical_aberration: float

voltage: float

pipeliner.deposition_tools.empiar_deposition_objects.get_imgfile_info(imgfile: str, blockname: str, img_block_col: str) → Tuple[Dict[str, Tuple[float, float, float]], List[List[str]]]

Get information from the STAR file containing image info

Parameters:

imgfile (str) – The path to the image file
blockname (str) – Name of the images block in the STAR file
img_block_col (str) – The name of the column for the images in the image data block of the STAR file

Returns:

( Dict with info about the optics groups {og_number: (apix, voltage, sphere. ab)}, List of full paths (relative to the working dir) for all the images in the file, except in the case of movies then the path is relative to import dir)

Return type:

tuple

pipeliner.deposition_tools.empiar_deposition_objects.prepare_empiar_corrparts(in_file: str, job: str | None = None) → List[EMPIARDepositionObject]

Prepare the particles deposition objects for an empiar deposition

Parameters:

in_file (str) – Path to STAR file containing the particles
job (Optional[str]) – The job creating the object

Returns:

The DepositionObjects

Return type:

List[EMPIARDepositionObject]

pipeliner.deposition_tools.empiar_deposition_objects.prepare_empiar_mics(in_file: str, job: str | None = None)

pipeliner.deposition_tools.empiar_deposition_objects.prepare_empiar_mics_parts_data(mpfile: str, is_parts: bool, is_cor_parts: bool, job: str | None = None) → List[EMPIARDepositionObject]

Prepare the micrographs or particles portion of an EMPIAR deposition

Parameters:

mpfile (str) – The name of the file containing the micrographs or particles
is_parts (bool) – Is the image set particles? will affect the info in the details
is_cor_parts (bool) – Is the image set corrected (polished particles)?
job (Optional[str]) – The job the particles/mics file came from

Returns:

A list of deposition objects

Return type:

list

pipeliner.deposition_tools.empiar_deposition_objects.prepare_empiar_parts(in_file: str, job: str | None = None) → List[EMPIARDepositionObject]

Prepare the particles deposition objects for an empiar deposition

Parameters:

in_file (str) – Path to STAR file containing the particles
job (Optional[str]) – The job creating the object

Returns:

The DepositionObjects

Return type:

List[EMPIARDepositionObject]

pipeliner.deposition_tools.empiar_deposition_objects.prepare_empiar_raw_mics(movfile: str, job: str | None = None) → List[EMPIARDepositionObject]

Prepare the raw micrographs portion of an EMPIAR deposition

Parameters:

movfile (str) – Movies STAR file to operate on
job (Optional[str]) – The job the moves cam from

Returns:

A: DepositionObject used to create a deposition

Return type:

List[EMPIARDepositionObject]

pipeliner.deposition_tools.empiar_deposition.get_citation_data(joboptions: Dict[str, str]) → List[Dict[str, object]]

Gets the data for an EMPIAR citation

Parameters:

joboptions (Dict[str, str]) – The joboption keys and values, where all values are strings

Returns:

The citation data formatted for an: empiar deposition

Return type:

List[Dict[str, Union[str, Dict[str, str]]]

pipeliner.deposition_tools.empiar_deposition.get_deposition_objects_empiar(terminal_job: str, do_parts: bool = True, do_rparts: bool = True, do_movs: bool = True, do_mics: bool = True) → List[EMPIARDepositionObject]

pipeliner.deposition_tools.empiar_deposition.merge_empiar_dep_objs(in_depobjs: List[EMPIARDepositionObject]) → List[EMPIARDepositionObject]

Merges together the list of DepositionObjects for an empiar job

For movies all deposition objects are kept. This is because multiple movie sets may be imported and combined. For corrected mics, particles, and corrected particles only the newest ones (the ones that contributed to the final job) are retained. This is to prevent duplications.

Parameters:: in_depobjs (List[EMPIARDepositionObject]) – The depobjs from the chain of jobs
Returns:: The merged deposition objects
Return type:: List[EMPIARDepositionObject]

pipeliner.deposition_tools.empiar_deposition.parse_empiar_jobstar(jobstar_file: str) → Dict[str, object]

Parse a job.star file from an empiar deposition job and format if for deposition

Parameters:: jobstar_file (str) – Path to the file
Returns:: The data formatted for deposition
Return type:: Dict[str, Union[str, Dict[str, str]]]

pipeliner.deposition_tools.empiar_deposition.prepare_empiar_deposition(terminal_job: str, jobstar_file: str | None = None, do_parts: bool = True, do_rparts: bool = True, do_movs: bool = True, do_mics: bool = True) → str

Prepare a deposition for empiar

Returns:: The name of the deposition file
Return type:: str

EMDB DepositionObjects

EMDB Deposition objects correspond to the schema here: http://ftp.ebi.ac.uk/pub/databases/emdb/doc/XML-schemas/emdb-schemas/v3/current_v3/doc/Untitled.html

class pipeliner.deposition_tools.onedep_deposition_objects.OneDepDepositionObject(category: str, data: Mapping[str, str | None], parent_job: str | None = None, ancestor_jobs: List[str] | None = None)

Bases: DepositionObject

This object contains data that will be used in an EMPIAR, EMDB, or PDB deposition

category

The category of deposition object. This should match the name of the dictionary key for the field in the database schema

Type:: str

parent_job

The name of the job that produced the data

Type:: Optional[str]

data

Contains the actual deposition data. Keys must exactly match the field names in the database schema

Type:: Dict[str, Optional[str]

uuid

A uuid4 string that identifies the object

Type:: str

ancestor_jobs

Some fields reference upstream jobs EG: em_3d_fitting created by an emplacement job has initial_refinement_model fields that references a pdbx_initial_refinement_model depob created by a fetch or model_angelo job. This job will usually the job that created the input for the parent job, and is only necessary of the specific ondep cat created has these sort of upstream references

Type:: str

pipeliner.deposition_tools.onedep_deposition_objects.deposition_object_from_map(the_map: str, map_type: str, id: str = '', number: int = 1) → OneDepDepositionObject

Make the deposition object for a map

Parameters:

the_map (str) – The map to deposit
map_type (str) – The type of map
id (str) – The emdb id of the deposition object
number (int) – The object number if multiple maps of this type are being deposited

Returns:

The deposition objects for the map

Return type:

DepositionObject

pipeliner.deposition_tools.onedep_deposition_objects.get_onedep_hm_notation(pg_sym: str) → str | None

Get the H-M notation of a symmetry for one dep

Parameters:: pg_sym (str) – The point group symmetry to get the H-M notation for, in Schoenflies notation
Returns:: The H-M notation of the symmetry
Return type:: Optional[int]

pipeliner.deposition_tools.onedep_deposition_objects.get_onedep_symmetry_entries(pg_sym: str) → Tuple[str | None, str | None, str | None]

Get the symmetry entries for onedep,

Parameters:: symmetry (str) – The point group symmetry to get the H-M notation in Schoenflies notation
Returns:: (Schoenflies symbol, circular symmetry, and HM notation)
Return type:: Tuple[Optional[str], Optional[int], Optional[int]]

pipeliner.deposition_tools.onedep_deposition_objects.make_em_software_depobj(the_prog: ExternalProgram, onedep_software_class, parent_ids: Dict[str, str], details: str = '', jobname: str = '', doi: str | None = None) → OneDepDepositionObject

Make a deposition object for a specific piece of software

This function generally doesn’t need to be explicitly called because the PipelinerJob will do it for each piece of software during creation of the deposition.

Parameters:

the_prog (ExternalProgram) – The program to get the depobj for
onedep_software_class – (str): The task performed by the software from the options in em_software.category in the OneDep schema
parent_ids (Dict[str, str]) – The IDs associated with depobjs that are parents to the one being created
jobname (str) – The job that the depobj is being created for
details (str) – Details about what the software was doing
doi (Optional[str]) – The DOI of the depobj, will normally be the reference from the job’s jobinfo

Returns:

The depobj for the program

Return type:

DepositionObject

pipeliner.deposition_tools.onedep_deposition_objects.parse_onedep_cif(input_file: str, jobname: str | None = None) → List[OneDepDepositionObject]

Parse a cif in the onedep pdb/emdb format and return deposition objects

If a depobj cannot be created, skip it and raise a warning

Parameters:

input_file (str) – The cif file to parse
jobname (Optional[str]) – The job that is reading the file

Returns:

A dep obj for each entry in the cif file

Return type:

List[OneDepDepositionObject]

pipeliner.deposition_tools.onedep_deposition.add_endpoint_ondep_deposition_objects(depobjs: List[OneDepDepositionObject], emdb_id: str, primary_map: str, additional_maps: List[str], halfmaps: List[str], sequences: List[str], masks: List[str], additional_onedep_cifs: List[str]) → None

pipeliner.deposition_tools.onedep_deposition.clean_dobj_cross_references(dobjs: List[OneDepDepositionObject]) → List[OneDepDepositionObject]

Remove non-top level deposition objects not referenced by others

This will only occur in ‘short’ depositions where early steps have been removed

Parameters:

dobjs – (List[OneDepDepositionObject]): The DepositionObjects to operate on

Returns:

Contains the most recent of each: em_image_processing depobj and all others

Return type:

List[OneDepDepositionObject]

pipeliner.deposition_tools.onedep_deposition.create_deposition_cif(depobjs: List[OneDepDepositionObject], emdb_id: str) → Document

Write a cif file for the deposition

Parameters:

depobjs (List[OneDepDepositionObject]) – The deposition objects with all the ids updated including the EMDB ID
emdb_id – The Emdb deposition ID, provided by emdb

Returns:

The cif document for the deposition

Return type:

gemmi.cif.Document

pipeliner.deposition_tools.onedep_deposition.evaluate_depofile(ciffile: str) → Tuple[List[ValidationError], bool]

Validate the cif file against the most current PDBe schema dict

Parameters:: ciffile (str) – path to cif file
Returns:: A list of ValidationErrors and a bool indicating if the cif file validation worked properly.
Return type:: Tuple[List[ValidationError], bool]

pipeliner.deposition_tools.onedep_deposition.get_deposition_objs_onedep(terminal_job: str) → List[OneDepDepositionObject]

Prepare get the deposition objects for each job in a work flow

Parameters:: terminal_job (str) – The job to use
Returns:: The gathered DepositionObject objects.
Return type:: List[OneDepDepositionObject]
Raises:: ValueError – If the terminal job is not found.

pipeliner.deposition_tools.onedep_deposition.get_most_recent_processing_depobjs(depobjs: List[OneDepDepositionObject]) → List[OneDepDepositionObject]

Keep only the latest em_image_processing depobj from each type

The type is defined by the details parameter, which is the same as job description Non em_image_processing depobjs will be dealt with later using cross-reference checking

Parameters:

depobjs – (List[OneDepDepositionObject]): The DepositionObjects to operate on

Returns:

Contains the most recent of each: em_image_processing depobj and all others

Return type:

List[OneDepDepositionObject]

pipeliner.deposition_tools.onedep_deposition.get_onedep_software_options() → List[str]

Get the allowed values for the em_software.name parameter

Returns:: The allowed values for the em_software.name parameter]
Return type:: List[str]

pipeliner.deposition_tools.onedep_deposition.make_uploads(outdir: str, deposition_id: str, main_map: str = '', masks: List[str] | None = None, halfmaps: List[str] | None = None, additional_maps: List[str] | None = None) → None

pipeliner.deposition_tools.onedep_deposition.make_upstream_crossrefs(depobjs: List[OneDepDepositionObject]) → None

Make sure depobjs with references to upstream depobes have the _id fields filled :param depobjs: The deposition objects from the workflow :type depobjs: List[OneDepDepositionObject]

Returns:: The _list deposition objects updated
Return type:: List[OneDepDepositionObject]

pipeliner.deposition_tools.onedep_deposition.prepare_onedep_deposition(model_file: str = '', primary_map: str = '', masks: List[str] | None = None, additional_maps: List[str] | None = None, sequences: List[str] | None = None, halfmaps: List[str] | None = None, emdb_id: str = '', outfile: str = '', verbose: bool = False, do_temp_onedep_update: bool = True, additional_onedep_cifs: List[str] | None = None) → Tuple[str, List[ValidationError]]

Prepare an emdb deposition cif file

go over the image_processing deposition objects, decide which are to be kept

This should be the one from the most recent job for each main processing category

if verbose, don’t throw away anything. This will lead to multiple image_processing

entries for intermediate reconstructions, classifications, and etc…

Throw away any other deposition objects that cross-reference obsolete ones, first doing image_processing

Parameters:

primary_map (str) – The main map that will be deposited, generally the postprocessed masked map
model_file (str) – Cif file for the fit atomic model
masks (Optional[List[str]]) – List of masks applied to the primary map
halfmaps (Optional[List[str]]) – List of halfmaps generated during refinement of the primary map
sequences (Optional[List[str]]) – List of fasta files for sequences associated with the fit atomic model
additional_maps (Optional[List[str]]) – List of additional maps, such as unmasked maps or 3D classes
emdb_id (str) – The EMDB id to give the deposition
outfile (str) – The output filename
verbose (bool) – Should the deposition contain every processing step?
additional_onedep_cifs (list[str]) – cif file(s) in the OneDep format that contain additional information to add to the deposition.
do_temp_onedep_update (bool) – If true, do temp deposition updates that correct for some quirks in the current onedep system. These are applied after generating the deposition so they can be easily rolled back when these onedep issues are fixed

pipeliner.deposition_tools.onedep_deposition.sort_depobjs_by_type(depobjs: List[OneDepDepositionObject]) → Dict[str, List[OneDepDepositionObject]]

Sort a list of deposition objects by type

Parameters:

depobjs (List[OneDepDepositionObject]) – The deposition objects with all the ids updated including the EMDB ID

Returns:

A dict of types with a list of: deposition objects for each

Return type:

Dict[str, List[OneDepDepositionObject]]

pipeliner.deposition_tools.onedep_deposition.temp_onedep_deposition_object_alterations(depobjs: List[OneDepDepositionObject]) → List[OneDepDepositionObject]

Modify the list of deposition objects for current limitations of the OneDep system

Add a single em_image_processing object, remove the others
Update all other DepositionObjects to point their image_processing_id to the
new one
Remove any software that reference programs not in the official list

These changes will be rolled back when the OneDep system is updated to accept multiple em_image_processing_id objects

Parameters:: depobjs (List[OneDepDepositionObject])
Returns:: The update deposition objects
Return type:: List[OneDepDepositionObject]

pipeliner.deposition_tools.onedep_deposition.update_and_format_deposition_objects(depobjs: List[OneDepDepositionObject]) → List[OneDepDepositionObject]

Update the .id fields of Deposition objects change UUIDs to int ids

Parameters:: depobjs (List[OneDepDepositionObject]) – The deposition objects from the workflow
Returns:: The Deposition objects with the IDs updated
Return type:: List[OneDepDepositionObject]

PDB deposition objects

PDB depositions are not currently supported