Pipeliner Jobs

Pipeliner jobs

class pipeliner.pipeliner_job.ExternalProgram(command: str, name: str | None = None, vers_com: List[str] | None = None, vers_lines: List[int] | None = None, emdb_categories: List[str] | None = None)

Bases: object

Class to store info about external programs called by the pipeliner

command

The command that will be used to run the program

Type:

str

name

The name for the program, command will be used unless this is specified

Type:

str

exe_path

The path to the executable for the program

Type:

str

vers_com

The command that needs to be run to get the version

Type:

List[str]

vers_lines

The lines from the output of the version command that contain the version info

Type:

List[int]

emdb_categories

How the software should be classified in the EMDB

Type:

List[str]

get_version(timeout: float = 1.0) str | None
class pipeliner.pipeliner_job.JobInfo(display_name: str = 'Pipeliner job', version: str = '0.0', job_author: str | None = None, short_desc: str = 'No short description for this job', long_desc: str = 'No long description for this job', documentation: str = 'No online documentation available', external_programs: List[ExternalProgram] | None = None, references: List[Ref] | None = None)

Bases: object

Class for storing info about jobs.

This is used to generate documentation for the job within the pipeliner

display_name

A user-friendly name to describe the job in a GUI, this should not include the software used, because that info is pulled from the job type

Type:

str

version

The version number of the pipeliner job

Type:

str

job_author

Who wrote the pipeliner job

Type:

str

short_desc

A one line “title” for the job

Type:

str

long_desc

A detained description about what the job does

Type:

str

documentation

A URL for online documentation

Type:

str

programs

A list of 3rd party software used by the job. These are used by the pipeliner to determine if the job can be run, so they need to include all executables the job might call. If any program on this list cannot be found with which then the job will be marked as unable to run.

Type:

List[ExternalProgram]

references

A list of Ref objects used

Type:

List[Ref]

alternative_unavailable_reason

This can be set to a string explaining why the job is unavailable if other checks for the job to be available (besides programs missing from the $PATH) have failed, e.g. a necessary library is missing.

Type:

str

can_continue

Can this job be continued?

Type:

bool

property is_available

Is the job available to run?

True if executables were found for all the job’s programs or if alternative_unavailable_reason has been set, or False otherwise.

property unavailable_message: str | None

Gives the reason the pipeliner has marked the job as unavailable

Returns:

The reason the job is marked unavailable, or None if it is available

Return type:

Optional[str]

class pipeliner.pipeliner_job.PipelinerCommand(cmd: Sequence[str | float | int], *, relion_control: bool = False, allow_failure: bool = False)

Bases: object

Holds a command that will be run by the pipeliner

cmd

The command that will be run. Each list item is one arg.

Type:

List[str, float, int]

relion_control

Does the command need the relion ‘–pipeline_control’ argument appended before being run

Type:

bool

allow_failure

If True, then the job will be allowed to continue even if this command fails. Be cautious if you use this option! You will need to consider all other places where the output of this command might be used and make the code suitably robust to handle possible missing files. In general, you should not set this option for any command that is required to produce an output node that might be used as input by another job.

Type:

bool

get_final_command(output_dir: str) List[str]

Get the final command to run from this PipelinerCommand object.

This makes a number of changes to the original command in self.cmd:

  1. All args are converted to strings.

  2. If the command is “python” or “python3”, the command name is replaced with

    the correct command to run the current Python executable.

  3. The command name is replaced by the path from shutil.which if one is

    found, otherwise it is left as a simple name.

  4. If self.relion_control is set, the Relion pipeline control arguments

    are appended to the command.

  5. If a $ sign is found in an argument, any environment variables in that

    argument are expanded with os.path.expandvars.

Parameters:

output_dir – The output dir for the job that runs this command. This value is only used if self.relion_control is set.

Returns:

The final command and its arguments, in a list ready to pass to subprocess.run.

class pipeliner.pipeliner_job.PipelinerJob

Bases: object

Super-class for job objects.

Each job type has its own sub-class.

WARNING: do not instantiate this class directly, use the factory functions in this module.

jobinfo

Contains information about the job such as references

Type:

JobInfo

output_dir

The path of the output directory created by this job

Type:

str

alias

the alias for the job if one has been assigned

Type:

str

is_continue

If this job is a continuation of an older job or a new one

Type:

bool

input_nodes

A list of Node objects for each file used as in input

Type:

list

output_nodes

A list of Node objects for files produced by the job

Type:

list

joboptions

A dict of JobOption objects specifying the parameters for the job

Type:

dict

is_tomo

Is the job a tomography job?

Type:

bool

working_dir

The working directory to be used when running the job. This should normally be left as None, meaning the job will be run in the project directory. Jobs that write files in their working directory should instead work somewhere within the job’s output directory, and take care to adjust the paths of input and output files accordingly.

Type:

str

raw_options

A dict of all raw joboptions as they were read in

Type:

dict

CATEGORY_LABEL = ''
OUT_DIR = ''
PROCESS_NAME = ''
add_compatibility_joboptions() None

Write additional joboptions for back compatibility

Some JobOptions are needed by the original program (hey Relion 4), but not the pipeliner, they are added here so the files pipeliner writes will be back compatible with their original program.

add_emdb_deposition_data(category: str, data_dict: Dict[str, str]) DepositionObject

Create a deposition object specific to this job

Parameters:
  • category (str) – the type of deposition object

  • data_dict (Dict[str, str]) – The fields and values in the resulting depobj’s data attr that need to be updated. Values not specificed will be left as default

add_main_emdb_processing_data() DepositionObject

Make an em_image_processing DepositionObject for a job

This is the base DepositionObject for all jobs that generate results. Most jobs should generate this along with the associated more specific deposition objects.

This deposition object should ALWAYS be created last, as it will update the cross-references in the associated depobjs when it is created.

Returns:

The em_image_processing DepositionObject

Return type:

DepositionObject

add_onedep_metadata_import()
add_output_node(file_name: str, node_type: str, keywords: List[str] | None = None) None

Helper function to add a new Node for a file in the job’s output directory.

This is a wrapper around node_factory.create_node which simply adds self.output_dir to the start of the file name before creating the node and adding it to self.output_nodes.

Parameters:
  • file_name – The name of the file that the new node will refer to. It is assumed that the file will be written to the job’a output directory. Note that the existence of the file is not checked, because this method will usually be called before the job has run.

  • node_type – The top-level type for the new node. This should almost always be one of the constants defined in pipeliner.nodes.

  • keywords – A list of keywords to append to the node type.

additional_joboption_validation() List[JobOptionValidationResult]

Advanced validation of job parameters

This is a placeholder function for additional validation to be done by individual job subtypes, such as comparing JobOption values IE: JobOption A must be > JobOption B

Avoid using self.get_string or self.get_number in this function as they may raise an error if the JobOption is required and has no value. Use self.joboptions[“jobopname”].value.

Returns:

A list JobOptionValidationResult

objects

Return type:

list

check_joboption_is_now_deactivated(jo: str) bool

Check if a joboption has become deactivated in relation to others

For example if job option A is False, job option B is now deactiavted

Parameters:

jo (str) – The name of the JobOption to test

Returns:

Has the JobOption been deactivated

Return type:

bool

check_joboption_is_now_required(jo: str) list

Check if a joboption has become required in relation to others

For example if job option A is True, job option B is now required

Parameters:

jo (str) – The name of the joboption to test

Returns:

pipeliner.job_options.JobOptionValidationResult:

for any errors found

Return type:

list

continuation_joboption_updates() None

Modifications to existing JobOptions for when a job is continued

Set a JobOption’s value for the continuation job.

This function can also do things like change the hard or suggested min/max, limit or expand multiple choice options, or make the regex validation in a StringJobOption more restrictive.

create_input_nodes(existing_nodes: Dict[str, Node] | None = None) None

Automatically add the job’s input nodes to its input node list.

Input nodes are created from each of the job’s job options.

Parameters:

existing_nodes – A dict of {node_name: Node object} for the current nodes in the pipeline, if available. This allows jobs to re-use existing nodes rather than creating new ones, which can be necessary if the type of the node is uncertain.

create_output_nodes() None

Make the job’s output nodes.

This method should be overridden by PipelinerJob subclasses.

The output nodes should be added to the list in the output_nodes attribute. The add_output_node function is helpful to create and add a new node in a single call.

If your job doesn’t make any output nodes, or doesn’t know what their names will be until the job has been run, you still need to override this method but your implementation can simply pass and do nothing. If you need to add output nodes at the end of the job, create them in create_post_run_output_nodes.

Note that this method is called by the job manager (via PipelinerJob.prepare_to_run) before the job is added to the pipeline. The job’s output directory does exist when this method is called, but that could change in future versions of the pipeliner and jobs should avoid making any file system changes in this method.

create_post_run_output_nodes()

Placeholder function for post run node creation

Some jobs have output nodes that can only be created after the job has run because their names are not known until after they have been created. They can be added here. This function should ONLY add output nodes; any other work should be done in commands run by the job.

create_results_display() Sequence[ResultsDisplayObject]

Create results display objects to be displayed by the GUI

This default implementation simply creates the default results display object for each of the job’s output nodes. Subclasses that want customised results should override this method.

Returns:

A list of ResultsDisplayObject

evaluate_qsub_template(sub_text: str) None

Check that the qsub template is appropriate for the pipeliner

Looks for the common differences between relion style sub script and pipeliner style. Issues warnings if suspected problems are found but still allows it to run.

Parameters:

sub_text (str) – The text of the submission script

gather_metadata() Dict[str, int | float | str | bool | dict | list | None]

Placeholder function for metadata gathering

Each job class should define this individually

Returns:

A placeholder “No metadata available” and the reason why

Return type:

dict

get_additional_reference_info() List[Ref]

A placeholder function for job that need to return additional references

This if for references that are not included in self.jobinfo, such as ones pulled from the EMDB/PDB in fetch jobs

get_category_label() str

Get a label for the category that this job belongs to.

If the job defines a CATEGORY_LABEL attribute, its value is simply returned. Otherwise, the second part of the process name is processed to produce a label by replacing underscores with spaces and converting to title case.

get_commands() List[PipelinerCommand]

Get the commands to be run for a specific job.

This method should be overridden by PipelinerJob subclasses.

Jobs are normally run with the project directory as the working directory. If your job needs to run in a different working directory (for example if it calls a program which always writes files into the current directory), set the self.working_dir attribute in this method.

Note that this method should run quickly! Any long-running actions should be done in one of the job’s commands instead. (If necessary, put Python code that needs to be run into a separate script in pipeliner.scripts.job_scripts and then call it as a command.)

Returns:

The commands as a list of PipelinerCommand objects

get_current_output_nodes() List[Node]

Get the current output nodes if the job was stopped prematurely

For most jobs there will not be any but for jobs with many iterations the most recent interation can be used if the job is aborted or failed and then later marked as successful

Returns:

of Node objects

Return type:

list

get_default_params_dict() Dict[str, str]

Get a dict with the job’s parameters and default values

Returns:

All the job’s parameters {parameter: default value}

Return type:

dict

get_extra_options() None

Get user specified extra queue submission options

get_job_subdirs()

Get all the subdirectories contained in the jobdir excluding the NodeDisplay

returns List[str]: The dirs

get_joboption_groups() Dict[str, List[str]]

Put the joboptions in groups according to their jobop_group attribute

Assumes that the joboptions have already been put in order of priority by self.set_joboption_order() or were in order to begin with.

Groups are ordered based on the highest priority joboption in that group from the order of the joboptions, except that “Main” is always the first group. Joboptions within the groups are ordered by priority.

Returns:

The joboptions groupings {group: [jopbop, … jobop]}

Return type:

Dict[str, List[str]]

get_mpi_command() List[int | float | str]
get_nr_mpi() int
get_nr_threads() int
get_runtab_options(mpi: bool = False, threads: bool = False, addtl_args: bool = False, mpi_default_min: int = 1, mpi_must_be_odd: bool = False) None

Get the options found in the Run tab of the GUI, which are common to for all jobtypes

Adds entries to the joboptions dict for queueing, MPI, threading, and additional arguments. This method should be used when initialising a PipelinerJob subclass

Parameters:
  • mpi (bool) – Should MPI options be included?

  • threads (bool) – Should multi-threading options be included

  • addtl_args (bool) – Should and ‘additional arguments’ be added

  • mpi_default_min (int) – The minimum for the default number of MPIs, will be used if mpi_default_min > user defined min number of MPI

  • mpi_must_be_odd (bool) – Does the number of mpis have to be odd, like for relion refine_jobs.

handle_doppio_uploads(dry_run=False) None

Tasks that have to be performed to deal with Doppio file uploads.

  • Move files from DoppioUploads to the job dir:

    DoppioUploads/tmpdir/file -> JobType/jobNNN/InputFiles/file

  • Update the job option values to point to the new file locations, so when the job input nodes are created they refer to the moved files

Parameters:

dry_run – If True, do not actually try to move any files, just update the job option values. This option is only intended for use in testing.

is_submit_to_queue() bool
load_results_display_files() Sequence[ResultsDisplayObject]

Load the job’s results display objects from files on disk.

This method must be fast because it is used by the GUI to load job results. Therefore, if a display object fails to load properly, no attempt is made to recalculate it and a ResultsDisplayPending object is returned instead.

If there are no results display files yet, an empty list is returned.

Returns:

A list of ResultsDisplayObject

make_additional_args() None

Get the additional arguments job option

make_final_emdb_deposition_objects() List[DepositionObject]

Get the deposition objects for a job

Gets the depobjs generated by the job’s specific methods and adds the general ones for its reference(s) and software

make_queue_options() None

Get options related to queueing and queue submission, which are common to for all jobtypes

parse_additional_args() List[str]

Parse the additional arguments job option and return a list

Returns:

A list ready to append to the command. Quotated strings are preserved

as quoted strings all others are split into individual items

Return type:

list

prepare_clean_up_lists(do_harsh: bool = False) Tuple[List[str], List[str]]

Placeholder function for preparation of list of files to clean up

Each job class should define this individually

Parameters:

do_harsh (bool) – Should a harsh cleanup be performed

Returns:

Two empty lists ([files, to, delete], [dirs, to, delete])

Return type:

tuple

prepare_emdb_deposition_data() List[DepositionObject]

This function makes the deposition objects specific to this job

This is the function that should be edited for each specific PipelinerJob The main em_image_processing, citations, and citation_author objects will be created automatically, and don’t need to be included in this function.

Returns:

All the deposition objects, except software and

citations

Return type:

List[DepositionObject]

prepare_empiar_deposition_data() List[DepositionObject]

This function makes the deposition objects aside from citations and software

This is the function that should be editied for each specific PipelinerJob

Returns:

All the deposition objects, except software and

citations

Return type:

List[DepositionObject]

prepare_for_continuation() None

Prepares this job to be continued

This method should be run on an existing job, i.e. one that has a matching Process in the ProjectGraph. This means self.output_dir, self.input_nodes, and self.output_nodes should be assigned.

Updates the parameters and values of each JobOption using continuation_joboption_updates()

Raises:

NotImplementedError – If the job cannot be continued

prepare_to_run(ignore_invalid_joboptions: bool = False, existing_nodes: Dict[str, Node] | None = None) None

Prepare the job to run.

This function is intended to be called by the pipeliner before the job file is saved to disk. It does several things including: - Validate the job options - Make the job directory - Move uploaded Doppio user files into the job directory

Parameters:
  • ignore_invalid_joboptions (bool) – Prepare the job to run anyway even if the job options appear to be invalid

  • existing_nodes – A dict of {node_name: Node object} for the current nodes in the pipeline, if available. This allows jobs to re-use existing nodes rather than creating new ones, which can be necessary if the type of the node is uncertain.

Raises:
  • ValueError – If the job options appear to be invalid and ignore_invalid_joboptions is not set

  • RuntimeError – If the job does not already have an output directory assigned

read(filename: str) None

Reads parameters from a run.job or job.star file

Parameters:

filename (str) – The file to read. Can be a run.job or job.star file

Raises:

ValueError – If the file is a job.star file and job option from the PipelinerJob is missing from the input file

relion_joboption_conversion() None

Any updates that need to be made to the JobOptions if it was converted from relion

This function should modify job options and not return anything. It may need to be updated at a later time to accept a list of raw options read from the file if it is needed to convert jobs where relion has JobOptions that are not present in the pipeliner version of the job

save_job_submission_script(commands: list) str

Writes a submission script for jobs submitted to a queue

Parameters:

commands (list) – The commands to save, in a list of lists format. In Relion this would be the actual job commands, but in the pipeliner it will just be a single command to run the job with the job_runner.

Returns:

The name of the submission script that was written

Return type:

str

Raises:
  • ValueError – If no submission script template was specified in the job’s joboptions

  • ValueError – If the submission script template is not found

  • RuntimeError – If the output script could not be written

save_results_display_files() Sequence[ResultsDisplayObject]

Create new results display objects and save them to disk.

This method removes any existing results display files first, and returns the new display objects after they have been created and saved.

Returns:

The newly-created results display objects.

set_joboption_order(new_order: List[str]) None

Replace the joboptions dict with an ordered dict

Use this to set the order the joboptions will appear in the GUI. If a joboption is not specified in the list it will be tagged on to the end of the list.

Parameters:

new_order (list[str]) – A list of joboption keys, in the order they should appear

Raises:

ValueError – If a nonexistent joboption is specified

set_option(line: str) None

Sets a value in the joboptions dict from a run.job file

Parameters:

line (str) – A line from a run.job file

Raises:
  • RuntimeError – If the line does not contain ‘==’

  • RuntimeError – If the value of the line does not match any of the joboptions keys

validate_dynamically_required_joboptions() List[JobOptionValidationResult]

Check all joboptions if they have become required because of if_required

For example if job option A is True, job option B is now required

Returns:

pipeliner.job_options.JobOptionValidationResult:

for any errors found

Return type:

list

validate_input_files() List[JobOptionValidationResult]

Check that files specified as inputs actually exist

Returns:

A list of pipeliner.job_options.JobOptionValidationResult

objects

Return type:

list

validate_joboptions() List[JobOptionValidationResult]

Make sure all the joboptions meet their validation criteria

Returns:

A list JobOptionValidationResult

objects

Return type:

list

validate_runtab_joboptions() List[JobOptionValidationResult]
write_jobstar(output_dir: str, output_fn: str = 'job.star', is_continue: bool = False)

Write a job.star file.

Parameters:
  • output_dir (str) – The output directory.

  • output_fn (str) – The name of the file to write. Defaults to job.star

  • is_continue (bool) – Is the file for a continuation of a previously run job? If so only the parameters that can be changed on continuation are written. Overrules is_continue attribute of the job

write_runjob(fn: str | None = None) None

Writes a run.job file

Parameters:

fn (str) – The name of the file to write. Defaults to the file the pipeliner uses for storing GUI parameters. A directory can also be entered and it will add on the file name ‘run.job’

class pipeliner.pipeliner_job.Ref(authors: str | List[str] | None = None, title: str = '', journal: str = '', year: str = '', volume: str = '', issue: str = '', pages: str = '', doi: str = '', **kwargs)

Bases: object

Class to hold metadata about a citation or reference, typically a journal article.

authors

The authors of the reference.

Type:

list

title

The reference’s title.

Type:

str

journal

The journal.

Type:

str

year

The year of publication.

Type:

str

volume

The volume number.

Type:

str

issue

The issue number.

Type:

str

pages

The page numbers.

Type:

str

doi

The reference’s Digital Object Identifier.

Type:

str

other_metadata

Other metadata as needed. Gathered from kwargs

Type:

dict

Display tools

Use these methods to create ResultsDisplayObject used by the pipeliner GUI Doppio to create graphical outputs for each job.

pipeliner.display_tools.create_results_display_object(dobj_type: str, **kwargs) ResultsDisplayObject

Safely create a results display object

Returns a ResultsDisplayPending if there are any problems. Give it the type of display object as the first argument followed by the kwargs for that specific type of ResultsDisplayObject

Parameters:

dobj_type (str) – The type of DisplayObject to create

pipeliner.display_tools.get_ordered_classes_arrays(model_file: str, ncols: int, boxsize: int, output_dir: str, output_filename: str, parts_file: str | None = None, title: str = '2D class averages', start_collapsed: bool = False, flag: str = '', base64_output: bool = False, optimiser_info: dict | None = None) ResultsDisplayObject

Return a 3D array of class averages from a Relion Class2D model file

Parameters:
  • model_file (str) – Name of the model file

  • ncols (int) – number of columns desired in the file montage

  • boxsize (int) – Size of the class averages in the final montage

  • output_dir (str) – The output dir of the pipeliner job creating this object

  • output_filename (str) – The name for the output montage file

  • parts_file (str) – Path of the file containing the particles, for counting

  • title (str) – A title for the DisplayObject

  • start_collapsed (bool) – Should the display start out collapsed when displayed in the GUI

  • flag (str) – If this display object contains scientificlly dubious results display this message

  • base64_output (bool) – flag for a JSON file output with a list of objects holding base64 images and class IDs

  • optimiser_info (dict) – {name: str, type: str} Specify an optimiser data node to be included in the associated nodes of a gallery results display object.

Returns:

An object for the GUI to use to render the graph

Return type:

ResultsDisplayMontage

pipeliner.display_tools.graph_from_starfile_cols(title: str, starfile: str, block: str, ycols: list, xcols: list | None = None, xrange: list | None = None, yrange: list | None = None, data_series_labels: List[str] | None = None, xlabel: str = '', ylabel: str = '', assoc_data: List[str] | None = None, modes: List[str] | None = None, start_collapsed: bool = False, flag: str = '') ResultsDisplayGraph | ResultsDisplayPending

Automatically generate a ResultsDisplayGraph object from a STAR file

Can use one or two columns and third column for labels if desired

Parameters:
  • title (str) – The title of the final graph

  • starfile (str) – Path to the STAR file to use

  • block (str) – The block to use in the STAR file, use None for a STAR file with only a single block

  • ycols (list) – Column label(s) from the STAR file to use for the y data series

  • xcols (list) – Column label(s) from the STAR file to use for the y data series if None a simple count from 1 will be used

  • xlabel (str) – Label for the x axis, if no x data are specified the label will ‘Count’, if x data are specified and the xlabel is None the x axis label will be the name of the STAR file column used

  • xrange (list) – Range for x vlaues to be displayed, full range if None

  • yrange (list) – Range for y vlaues to be displayed, full range if None

  • data_series_labels (list) – Names for the data series

  • ylabel (str) – Label for the y axis, if None the y axis label will be the name of the STAR file column used

  • assoc_data (list) – List of data file(s) associated with this graph

  • modes (list) – Controls the appearance of each data series, choose from ‘lines’, ‘markers’ ‘or lines+markers’

  • start_collapsed (bool) – Should the display start out collapsed when displayed in the GUI

  • flag (str) – If this display object contains scientificlly dubious results display this message

Returns:

A ResultsDisplayGraph object for the created graph

Return type:

ResultsDisplayGraph

pipeliner.display_tools.histogram_from_starfile_col(title: str, starfile: str, block: str, data_col: str, xlabel: str = '', ylabel: str = 'Count', assoc_data: List[str] | None = None, start_collapsed: bool = False, flag: str = '') ResultsDisplayHistogram | ResultsDisplayPending

Automatically generate a ResultsDisplayHistogram object from a STAR file

Parameters:
  • title (str) – The title of the final graph

  • starfile (str) – Path to the STAR file to use

  • block (str) – The block to use in the STAR file, use None for a STAR file with only a single block

  • data_col (str) – Column label from the STAR file to use for the data series

  • xlabel (str) – Label for the x axis, if no x data are specified the label will ‘Count’, if x data are specified and the xlabel is None the x axis label will be the name of the STAR file column used

  • ylabel (str) – Label for the y axis, if None the y axis label will be the name of the STAR file column used

  • assoc_data (list) – List of data file(s) associated with this graph

  • start_collapsed (bool) – Should the display start out collapsed when displayed in the GUI

  • flag (str) – If this display object contains scientifically dubious results display this message

Make a new image carousel display object from a STAR file.

Parameters:
  • starfile (str) – The STAR file to use

  • block (str) – The name of the block with the images. If the STAR file contains only a single block, a blank string can be given here.

  • column (str) – The name of the column that has the images

  • title (str) – The title for the object, automatically generated if “”

  • nimg (int) – Number of images to use in the carousel, default 500, or all if < 0. Beware that using all images could lead to enormous results display files if the data set is large, so you should usually leave this at its default.

  • start_collapsed (bool) – Should the display start out collapsed when displayed in the GUI

  • flag (str) – If this display object contains scientificlly dubious results display this message

Returns:

The DisplayObject for displaying STAR file images in the interactive carousel.

Return type:

ResultsDisplayImageCarousel

Raises:

ValueError – If no images are listed in the STAR file.

pipeliner.display_tools.make_map_model_thumb_and_display(outputdir: str, maps: List[str] | None = None, maps_opacity: List[float] | None = None, maps_colours: List[str] | None = None, models: List[str] | None = None, models_colours: List[str] | None = None, bild_files: List[str] | None = None, title: str | None = None, maps_data: str = '', models_data: str = '', assoc_data: List | None = None, flag: str = '', start_collapsed: bool | None = True) ResultsDisplayMapModel | ResultsDisplayPending

Make a display object for an atomic model overlaid over a map

Makes a binned map and a ResultsDisplayMapModel display object

Parameters:
  • outputdir (str) – Name of the job’s output directory

  • maps (list) – List of map files to use

  • models (list) – List of model files to use

  • maps_opacity (list) – List of opacity for the maps, from 0-1 if None 0.5 is used for all

  • maps_colours (list) – Colors for the maps of specific ones are desired, otherwise mol* will assign them

  • title (str) – The title for the ResultsDisplayMapModel object, if None the name of the map and model will be used

  • maps_data (str) – Any additional data to be included about the map

  • models_data (str) – Any additional data to be included about the map

  • models_colours (list) – Colors for the models of specific ones are desired, otherwise mol* will assign them

  • assoc_data (list) – List of associated data, if left as None then just uses the file itself

  • flag (str) – If the results are considered scientifically dubious explain in this string

  • start_collapsed (bool) – Should the display start out collapsed when displayed

Returns:

The DisplayObject for the map and model

Return type:

ResultsDisplayMapModel

pipeliner.display_tools.make_maps_slice_montage_and_3d_display(in_maps: Dict[str, str], output_dir: str, combine_montages: bool = True, cmap: str = '', base64_output: bool = False, optimiser_info: dict | None = None, bild_files: List[str] | None = None) List[ResultsDisplayObject]

Make a set of display objects for 3D maps

Returns separate 3D viewer display objects for each map and either a combined slices montage or a slices montage for each.

Parameters:
  • in_maps (dict) – {input file: label}. If the label is “”, the filename will be used

  • output_dir (str) – The job’s output dir where the thumbnails dir will be created if necessary

  • combine_montages (bool) – Should a single montage be made with slices for all maps, otherwise a separate montage is made for each

  • cmap (str) – what color map to use for the montage, if any.

  • base64_output (bool) – Whether to create base64 thumbnails and gallery display object.

  • optimiser_info (dict) – {name: str, type: str} Specify an optimiser data node to be included in the associated nodes of a gallery results display object.

Returns:

The display objects montage and then the 3D viewers if combine_montages

is False, otherwise the montage followed by the 3D viewer for each map in the order they were given.

Return type:

List

pipeliner.display_tools.make_mollweide_angular_distribution(starfile_path: str, output_dir: str, title: str = 'Angular distribution of particles', block: str = 'particles', n_phi_bins: int = 72, n_theta_bins: int = 36) ResultsDisplayImage | ResultsDisplayPending

Create a Mollweide projection heatmap of particle orientations

Reads _rlnAngleRot (phi) and _rlnAngleTilt (theta) from a RELION STAR file and produces a 2-D histogram on a Mollweide projection, saved as a PNG. For C1 symmetry, the plot will show the full distribution of particle orientations. For higher symmetries the plot might occupy a subsection of the full projection, depending on the symmetry axes, e.g. quarter of the plot for D2.

Parameters:
  • starfile_path (str) – Path to a STAR file containing particle data with _rlnAngleRot and _rlnAngleTilt columns

  • output_dir (str) – Job output directory (PNG is saved under Thumbnails/)

  • title (str) – Title for the plot and display object

  • block (str) – Block name in the STAR file

  • n_phi_bins (int) – Number of bins in the azimuthal direction

  • n_theta_bins (int) – Number of bins in the polar direction

Returns:

ResultsDisplayImage or ResultsDisplayPending on error

pipeliner.display_tools.make_moorhen_display(maps: List[str] | None = None, models: List[str] | None = None, title: str | None = None, maps_data: str = '', models_data: str = '', assoc_data: List | None = None, flag: str = '', session_file: str | None = None) ResultsDisplayMapModel | ResultsDisplayPending

Make a Moorhen display object for maps and/or models

Creates a ResultsDisplayMoorhen display object. If no title is provided, one is generated from the map and model file names.

Parameters:
  • maps (list) – List of map files to use

  • models (list) – List of model files to use

  • title (str) – The title for the display object. If None, a title is generated from the map and model file names.

  • maps_data (str) – Any additional data to be included about the maps. If empty, defaults to a comma-separated list of map paths.

  • models_data (str) – Any additional data to be included about the models. If empty, defaults to a comma-separated list of model paths.

  • assoc_data (list) – List of associated data. If None, defaults to the combined list of maps and models.

  • flag (str) – If the results are considered scientifically dubious explain in this string

  • session_file (str) – Path to a Moorhen session file

Returns:

The Moorhen DisplayObject for the maps and models, or a ResultsDisplayPending if an error occurs

Return type:

ResultsDisplayMoorhen

pipeliner.display_tools.make_mrcs_central_slices_montage(in_files: Dict[str, str], output_dir: str, cmap: str = '', base64_output: bool = False, optimiser_info: dict | None = None) ResultsDisplayObject

Make a montage of x,y,z central slices of maps

Parameters:
  • in_files (Dict[str, str]) – {file name: label if different from file name}

  • output_dir (str) – Where to make the Thumbnails dir (if necessary) and put the montage image

  • cmap (str) – What colormap to use, if any

  • optimiser_info (dict) – {name: str, type: str} Specify an optimiser data node to be included in the associated nodes of a gallery results display object.

  • Returns

  • ResultsDisplayMontage – The montage ResultsDisplayObject

pipeliner.display_tools.make_particle_coords_thumb(in_mrc, in_coords, out_dir, thumb_size=640, pad=5, start_collapsed=False, title: str = 'Example picked particles', flag: str = '', markers: bool = False) ResultsDisplayImage | ResultsDisplayPending

Create a thumbnail of picked particle coords on their micrograph

Because the extraction box size is not known boxes will be a % of the total image size.

Parameters:
  • in_mrc (str) – Path to the merged micrograph mrc file

  • in_coords (str) – Path to the .star coordinates file

  • out_dir (str) – Name of the output directory

  • thumb_size (int) – Size of the x dimension of the final thumbnail image

  • pad (int) – Thickness of the particle box borders before binning in px

  • start_collapsed (bool) – Should the display start out collapsed when displayed in the GUI

  • title (str) – What title to use for the displayobj created

  • flag (str) – If this display object contains scientificlly dubious results display this message

  • markers (bool) – Instead of making boxes make markers

pipeliner.display_tools.mini_montage_from_many_files(filelist: List[str], outputdir: str, nimg: int = 5, montagesize: int = 640, title: str = '', ncols: int = 5, associated_data: List[str] | None = None, labels: List[str] | None = None, cmap: str = '', start_collapsed: bool = False, flag: str = '') ResultsDisplayMontage | ResultsDisplayPending

Make a mini montage from a list of images

Merge and flatten image stacks

Parameters:
  • filelist (list) – A list of the files to use

  • outputdir (str) – The output dir of the pipeliner job

  • nimg (int) – Number of images to use in the montage

  • montagesize (int) – Desired size of the final montage image

  • title (str) – Title for the ResultsDisplay object that will be output

  • ncols (int) – Number of columns to make in the montage

  • associated_data (list) – Data files associated with these images, if None then all of the selected images

  • labels (list) – The labels for the items in the montage

  • cmap (str) – colormap to apply, if any

  • start_collapsed (bool) – Should the display start out collapsed when displayed in the GUI

  • flag (str) – If this display object contains scientificlly dubious results display this message

Returns:

The DisplayObject for the map

Return type:

ResultsDisplayImage

Raises:

ValueError – If a non mrc or tiff image is used

pipeliner.display_tools.mini_montage_from_stack(stack_file: str, outputdir: str, nimg: int = 40, ncols: int = 10, montagesize: int = 640, title: str = '', labels: List[str | int] | None = None, cmap: str = '', start_collapsed: bool = False, flag: str = '') ResultsDisplayMontage | ResultsDisplayPending

Make a montage from a mrcs or tiff file

Parameters:
  • stack_file (str) – The path to the stack_file

  • outputdir (str) – The output dir of the pipeliner job

  • nimg (int) – Number of images to use in the montage, if < 1 uses all of them

  • ncols (int) – Number of columns to use

  • montagesize (int) – Desired size of the final montage image

  • title (str) – Title for the ResultsDisplay object that will be output

  • labels (list) – Labels for the images

  • cmap (str) – colormap to apply, if any

  • start_collapsed (bool) – Should the display start out collapsed when displayed in the GUI

  • flag (str) – If this display object contains scientificlly dubious results display this message

Returns:

The DisplayObject for the map

Return type:

ResultsDisplayImage

Raises:

ValueError – If a non mrc or tiff image is used

pipeliner.display_tools.mini_montage_from_starfile(starfile: str, block: str, column: str, outputdir: str, title: str = '', nimg: int = 20, montagesize: int = 640, ncols: int = 10, labels: List[str] | None = None, cmap: str = '', start_collapsed: bool = False, flag: str = '') ResultsDisplayMontage | ResultsDisplayPending

Make a montage from a list of images in a STAR file column

Merge and flatten image stacks if they are encountered.

Parameters:
  • starfile (str) – The STAR file to use

  • block (str) – The name of the block with the images

  • column (str) – The name of the column that has the images

  • outputdir (str) – The output dir of the pipeliner job

  • title (str) – The title for the object, automatically generated if “”

  • nimg (int) – Number of images to use in the montage, uses all if < 0

  • montagesize (int) – Desired size of the final montage image

  • ncols (int) – number of columns to use

  • labels (list) – Labels for the images in the montage, in order

  • cmap (str) – colormap to apply, if any

  • start_collapsed (bool) – Should the display start out collapsed when displayed in the GUI

  • flag (str) – If this display object contains scientificlly dubious results display this message

Returns:

The DisplayObject for the map

Return type:

ResultsDisplayImage

Raises:

ValueError – If a non mrc or tiff image is encountered

ResultsDisplay Objects

These objects generally should not be instantiated directly they should instead be created using the functions above.

class pipeliner.results_display_objects.ResultsDisplayGallery(*, title: str, images: str, labels: List[str] | None = None, associated_nodes: List[Dict[str, str]], associated_data: List[str], start_collapsed: bool = False, flag: str = '')

Bases: ResultsDisplayObject

Display object for Doppio’s interactive image gallery

It is best to not instantiate this class directly. Instead create it using create_results_display_object

title

The title of the object/graph

Type:

str

images

Path to a .json file containing a list of image objects (base64 images and class IDs)

Type:

str

labels

Data labels for the images

Type:

list

associated_nodes

{name: str, type: str} A list of nodes associated with this gallery along with their full node types

Type:

list[dict]

associated_data

A list of files that contributed the data used in the image/graph

Type:

list

start_collapsed

Should the object start out collapsed when displayed in the GUI

Type:

bool

class pipeliner.results_display_objects.ResultsDisplayGraph(*, xvalues: List[List[float | int]], yvalues: List[List[float | int]], title: str, associated_data: List[str], data_series_labels: List[str], xaxis_label: str = '', xrange: List[float] | None = None, yaxis_label: str = '', yrange: List[float] | None = None, modes: List[Literal['lines', 'markers', 'lines+markers']] | None = None, start_collapsed: bool = False, flag: str = '')

Bases: ResultsDisplayObject

A simple graph for the GUI to display

It is best to not instantiate this class directly. Instead create it using create_results_display_object

title

The title of the object/graph

Type:

str

xvalues

(list): list of x coordinate data series, can have multiple data series

xaxis_label

Label for the x axis if a graph

Type:

str

xrange

Range of x to be displayed, displays the full range if None. If the x axis needs to be reveresd then enter the values backwards [max, min]

Type:

list

yvalues

(list): List y coordinate data series can have multiple data series

yaxis_label

Label for the y axis if a graph

Type:

str

yrange

Range of y to be displayed, displays the full range if None. If the y axis needs to be reveresd then enter the values backwards [max, min]

Type:

list

data_series_labels

List of names of the different data series

Type:

list

associated_data

A list of files that contributed the data used in the image/graph

Type:

list

modes

Controls the appearance of each data series, choose from ‘lines’, ‘markers’ ‘or lines+markers’

Type:

list

start_collapsed

Should the object start out collapsed when displayed in the GUI

Type:

bool

class pipeliner.results_display_objects.ResultsDisplayHistogram(*, title: str, associated_data: List[str], data_to_bin: List[float] | None = None, xlabel: str = '', ylabel: str = '', bins: List[int] | None = None, bin_edges: List[float] | None = None, start_collapsed: bool = False, flag: str = '')

Bases: ResultsDisplayObject

A class for the GUI to display a histogram

It is best to not instantiate this class directly. Instead, create it using create_results_display_object

Parameters:
  • title (str) – The title of the histogram

  • data_to_bin (list) – The data to bin

  • xlabel (str) – Label for the x axis

  • ylabel (str) – Label for the y axis

  • associated_data (list) – List of data files associated with the histogram

  • bins (list) – A list of bin counts, if they are known

  • bin_edges (list) – A list of the bin edges, if they are already known

  • start_collapsed (bool) – Should the object start out collapsed when displayed in the GUI

Raises:
  • ValueError – If no data or bins are specified

  • ValueError – If an attempt is made to specify bins or bin edges when data to bin are being provided

  • ValueError – If the associated data is not a list, or not provided

class pipeliner.results_display_objects.ResultsDisplayHtml(*, title: str, associated_data: List[str], html_dir: str = '', html_file: str = '', html_str: str = '', start_collapsed: bool = False, flag: str = '')

Bases: ResultsDisplayObject

An object for the GUI to display html

It is best to not instantiate this class directly. Instead create it using create_results_display_object

This can be used for general HTML display in Doppio. Either provide a directory with index.html or specify a html file or provide a html string as input.

html_dir

Path to the html directory (optional)

Type:

str

html_file

Path to a standalone html file or in the given html_dir (optional)

Type:

str

html_str

Input html as string (optional)

Type:

str

start_collapsed

Should the object start out collapsed when displayed in the GUI

Type:

bool

class pipeliner.results_display_objects.ResultsDisplayImage(*, title: str, image_path: str, image_desc: str, associated_data: List[str], start_collapsed: bool = False, flag: str = '')

Bases: ResultsDisplayObject

A class for the GUI to display a single image

It is best to not instantiate this class directly. Instead create it using create_results_display_object

title

The title for the image

Type:

str

image_path

The path to the image

Type:

str

image_desc

A description of the image

Type:

str

associated_data

Data files associated with the image

Type:

list

start_collapsed

Should the object start out collapsed when displayed in the GUI

Type:

bool

class pipeliner.results_display_objects.ResultsDisplayImageCarousel(*, title: str, image_names: List[str], image_index_file: str, associated_data: List[str] | None = None, start_collapsed: bool = False, flag: str = '')

Bases: ResultsDisplayObject

A class for Doppio’s interactive image carousel

It is best to not instantiate this class directly. Instead create it using create_results_display_object or display_tools.make_image_carousel

title

The title for display

Type:

str

image_names

The paths to the images

Type:

list[str]

image_index_file

Source STAR file

Type:

str

associated_data

Data files associated with the images

Type:

list[str]

start_collapsed

Should the object start out collapsed when displayed in the GUI

Type:

bool

class pipeliner.results_display_objects.ResultsDisplayJson(*, file_path: str, title: str = '', start_collapsed: bool = False, flag: str = '')

Bases: ResultsDisplayObject

An object for the GUI to display JSON files

It is best to not instantiate this class directly. Instead create it using create_results_display_object

file_path

Path to the file

Type:

str

start_collapsed

Should the object start out collapsed when displayed in the GUI

Type:

bool

class pipeliner.results_display_objects.ResultsDisplayMapModel(title: str, associated_data: List[str], maps: List[str] | None = None, models: List[str] | None = None, maps_data: str = '', models_data: str = '', maps_opacity: List[float] | None = None, maps_colours: List[str] | None = None, models_colours: List[str] | None = None, start_collapsed: bool = True, bild_files: List[str] | None = None, flag: str = '')

Bases: ResultsDisplayObject

An object for overlaying multiple maps and/or models

It is best to not instantiate this class directly. Instead create it using create_results_display_object

title

The title that appears at the top of the accordian in the GUI

Type:

str

associated_data

A list of associated data files

Type:

list

maps

List of map paths, mrc format

Type:

list

models

List of model paths, pdb or mmcif format

Type:

list

maps_opacity

Opacity for each map from 0-1 if not specified set at 0.5 for all maps

Type:

list

models_data

Any extra info about the models

Type:

str

maps_data

Any extra info about the maps

Type:

str

maps_colours

Hex values for colouring the maps specific colours, in the form “#XXXXXX” where X is a hex digit (0-9 or a-f). If None, the standard colours will be used

Type:

list

models_colours

Hex values for colouring the models specific colours, in the form “#XXXXXX” where X is a hex digit (0-9 or a-f). If None, the standard colours will be used

Type:

list

bild_files

Optional list of .bild file paths to overlay in the 3D viewer (e.g. angular distribution plots)

Type:

list

Raises:
  • ValueError – If no maps or models were specified

  • ValueError – If the map is not .mrc format

  • ValueError – If models are not in pdb of mmcif format

  • ValueError – If the number of maps and map opacities don’t match

class pipeliner.results_display_objects.ResultsDisplayMontage(*, xvalues: List[int], yvalues: List[int], img: str, title: str, associated_data: List[str], labels: List[str] | None = None, start_collapsed: bool = False, flag: str = '')

Bases: ResultsDisplayObject

An object to send to the GUI to make an image montage

This one is an image montage with info about the specific images It is best to not instantiate this class directly. Instead create it using create_results_display_object

title

The title of the object/graph

Type:

str

xvalues

(list): The x coordinates by image

yvalues

(list): The y coordinates by image

labels

Data labels for the images

Type:

list

associated_data

A list of files that contributed the data used in the image/graph

Type:

list

img

Path to an image to display

Type:

str

start_collapsed

Should the object start out collapsed when displayed in the GUI

Type:

bool

class pipeliner.results_display_objects.ResultsDisplayMoorhen(title: str, associated_data: List[str], maps: List[str] | None = None, models: List[str] | None = None, session_file: str | None = None, maps_data: str = '', models_data: str = '', start_collapsed: bool = True, flag: str = '')

Bases: ResultsDisplayObject

An object for displaying Maps, Models, and Moorhen job sessions.

It is best to not instantiate this class directly. Instead create it using create_results_display_object

title

The title that appears at the top of the accordian in the GUI.

Type:

str

associated_data

A list of associated data files

Type:

list

maps

List of map paths in .mrc, .map, .ccp4 or .mtz format

Type:

list

models

List of model paths in .pdb, .cif, .mmcif, .pdbx or .ent format

Type:

list

session_file

Path to a Moorhen session file. This contains the state of the Moorhen session, if provided the expected maps and models should match that of the session file. If not provided, a warning is appended to the flag.

Type:

str

maps_data

Any extra info about the maps

Type:

str

models_data

Any extra info about the models

Type:

str

start_collapsed

Whether the accordion starts collapsed in the GUI

Type:

bool

flag

Flag string for additional status information

Type:

str

Raises:
  • ValueError – If a map is not in .mrc, .map, .ccp4 or .mtz format

  • ValueError – If a model is not in .pdb, .cif, .mmcif, .pdbx or .ent format

class pipeliner.results_display_objects.ResultsDisplayObject(title: str, start_collapsed: bool = False, flag='')

Bases: object

Abstract super-class for results display objects

title

The title

Type:

str

start_collapsed

Should the object start out collapsed when displayed in the GUI

Type:

bool

dobj_type

Used to identify what kind of ResultsDisplayObject it is

Type:

str

flag

A message that is displayed if the results display object is showing somthing scientifically dubious.

Type:

str

write_displayobj_file(outdir) None

Write a json file from a ResultsDisplayObject object

Parameters:

outdir (str) – The directory to write the output in

Raises:

NotImplementedError – If a write attempt is made from the superclass

class pipeliner.results_display_objects.ResultsDisplayPdfFile(*, file_path: str, title: str = '', start_collapsed: bool = False, flag: str = '')

Bases: ResultsDisplayObject

An object for the GUI to display pdf files

It is best to not instantiate this class directly. Instead create it using create_results_display_object

file_path

Path to the file

Type:

str

start_collapsed

Should the object start out collapsed when displayed in the GUI

Type:

bool

class pipeliner.results_display_objects.ResultsDisplayPending(*, title: str = 'Results pending...', message: str = 'The result not available yet', reason: str = 'unknown', start_collapsed: bool = False, flag: str = '')

Bases: ResultsDisplayObject

A placeholder class for when a job is not able to produce results yet

class pipeliner.results_display_objects.ResultsDisplayPlotlyFigure(title: str, plotlyfig: str, associated_data: List[str] | None = None, start_collapsed: bool = False)

Bases: ResultsDisplayObject

This class displays an existing Plotly Figure object.

Call fig.to_json() on your Figure and then pass the JSON string to the plotlyfig argument when creating this object.

write_displayobj_file(outdir)

Write a json file from a ResultsDisplayObject object

Parameters:

outdir (str) – The directory to write the output in

Raises:

NotImplementedError – If a write attempt is made from the superclass

class pipeliner.results_display_objects.ResultsDisplayPlotlyHistogram(*, data: List[float] | DataFrame | ndarray | dict | None = None, title: str, x: str | list | None = None, y: str | list | None = None, color: str | int | list | None = None, nbins: int | None = None, range_x: list | None = None, range_y: list | None = None, category_orders: dict | None = None, labels: dict | None = None, bin_counts: List[float] | None = None, bin_centres: List[float] | None = None, associated_data: List[str], start_collapsed: bool = False, flag: str = '', **kwargs)

Bases: ResultsDisplayObject

A class that generates plotly.graph_objects.Figure object to display a histogram Uses plotly express histogram https://plotly.com/python-api-reference/generated/plotly.express.histogram.html Examples here: https://plotly.com/python/histograms/

data

The data to bin. Following types are allowed list - list of values to be binned array and dict - converted to a pandas dataframe internally pandas dataframe - ensure column names are added if ‘x’ indicates a column name More details https://plotly.com/python/px-arguments/

title

The title of the plot

Type:

str

plotlyfig

plotly.graph_objects.Figure object generated from input data

Type:

plotly.graph_objects.Figure

associated_data

A list of the associated data files

Type:

list

start_collapsed

Should the object start out collapsed when displayed in the GUI

Type:

bool

class pipeliner.results_display_objects.ResultsDisplayPlotlyObj(*, data: list | DataFrame | ndarray | dict, plot_type: list | str, title: str, associated_data: List[str], multi_series: bool = False, subplot: bool = False, make_subplot_args: dict | None = None, subplot_order: str | List[tuple] | None = None, subplot_size: Sequence[int] | None = None, subplot_args: List[dict] | None = None, series_args: List[dict] | None = None, layout_args: dict | None = None, trace_args: dict | None = None, xaxes_args: List[dict] | dict | None = None, yaxes_args: List[dict] | dict | None = None, start_collapsed: bool = False, flag: str = '', **kwargs)

Bases: ResultsDisplayObject

This uses the plotly express class to create plotly.graph_objects.Figure object https://plotly.com/python/plotly-express/ Use this class to generate plotly Figure objects for custom plots including facet-plots: https://plotly.com/python/facet-plots/ subplots: https://plotly.com/python/subplots/ multi_series: e.g. https://plotly.com/python/creating-and-updating-figures/#adding-traces

data

The data to plot. For a single plot, following types are allowed: list - list of values to be binned array and dict - converted to a pandas dataframe internally pandas dataframe - ensure column names are added if ‘x’ indicates a column name More details https://plotly.com/python/px-arguments/ For subplots and/or multi_series: list - list with dictionary of arguments for each plot/series

plot_type

Required, type of plot. For a single plot, it is the plotly express function to call https://plotly.com/python-api-reference/plotly.express.html For subplots and/or multi_series, plotly.graph_objects function to call https://plotly.com/python-api-reference/plotly.graph_objects.html

Type:

str

title

The title of the plot

Type:

str

plotlyfig

plotly.graph_objects.Figure object generated from input data

Type:

plotly.graph_objects.Figure

associated_data

A list of the associated data files

Type:

list

start_collapsed

Should the object start out collapsed when displayed in the GUI

Type:

bool

check_multiseries_arguments(data, plot_type, series_args) None
check_plottype_list(plot_type, data) None
check_singleplot_arguments(plot_type) None
check_subplot_arguments(data, subplot_size, subplot_order, plot_type, subplot_args, xaxes_args, yaxes_args) None
generate_multiseries_plots(plot_type, plot_args) Figure
generate_subplots(subplot_size, plot_type, subplot_order, plot_args, make_subplot_args) Figure
set_multiplot_data(data) None
set_singleplot_data(data) None
class pipeliner.results_display_objects.ResultsDisplayPlotlyScatter(*, data: List[List[float]] | DataFrame | ndarray | dict | None = None, title: str, x: str | List[float] | None = None, y: str | List[float] | None = None, color: str | int | Sequence[str] | None = None, size: str | int | Sequence[str] | None = None, symbol: str | int | Sequence[str] | None = None, hover_name: str | int | Sequence[str] | None = None, range_color: list | None = None, range_x: list | None = None, range_y: list | None = None, category_orders: dict | None = None, labels: dict | None = None, associated_data: List[str], start_collapsed: bool = False, flag: str = '', **kwargs)

Bases: ResultsDisplayObject

A class that generates plotly.graph_objects.Figure object to display a scatter plot Uses plotly express scatter https://plotly.com/python-api-reference/generated/plotly.express.scatter.html Examples here: https://plotly.com/python/line-and-scatter/

data

The data to bin. Following types are allowed list - list of values to be binned array and dict - converted to a pandas dataframe internally pandas dataframe - ensure column names are added if ‘x’ indicates a column name More details https://plotly.com/python/px-arguments/

title

The title of the plot

Type:

str

plotlyfig

plotly.graph_objects.Figure object generated from input data

Type:

plotly.graph_objects.Figure

associated_data

A list of the associated data files

Type:

list

start_collapsed

Should the object start out collapsed when displayed in the GUI

Type:

bool

class pipeliner.results_display_objects.ResultsDisplayRvapi(*, title: str, rvapi_dir: str, start_collapsed: bool = False, flag: str = '')

Bases: ResultsDisplayObject

An object for the GUI to display rvapi objects

It is best to not instantiate this class directly. Instead create it using create_results_display_object

This can be used for general HTML display in Doppio. Create a directory with index.html and it will be shown in the results display tab

rvapi_dir

Path to the rvapi directory

Type:

str

start_collapsed

Should the object start out collapsed when displayed in the GUI

Type:

bool

class pipeliner.results_display_objects.ResultsDisplayTable(*, title: str, headers: List[str], table_data: List[List[str]], associated_data: List[str], header_tooltips: List[str] | None = None, start_collapsed: bool = False, flag: str = '')

Bases: ResultsDisplayObject

An object for the GUI to display a table

It is best to not instantiate this class directly. Instead create it using create_results_display_object

title

The title of the table

Type:

str

headers

The column headers for the table

Type:

list

table_data

A list of lists, on per row

Type:

list

associated_data

A list of the associated data files

Type:

list

header_tooltips

Tooltips for each column. Column header by default

Type:

list

start_collapsed

Should the object start out collapsed when displayed in the GUI

Type:

bool

class pipeliner.results_display_objects.ResultsDisplayText(*, title: str, display_data: str, associated_data: List[str] | None = None, start_collapsed: bool = False, flag: str = '')

Bases: ResultsDisplayObject

A class to display general text in the GUI results tab

It is best to not instantiate this class directly. Instead create it using create_results_display_object

title

the title of the section

Type:

str

display_data

The text to display

Type:

str

associated_data

Data files associated with this result

Type:

list

start_collapsed

Should the object start out collapsed when displayed in the GUI

Type:

bool

class pipeliner.results_display_objects.ResultsDisplayTextFile(*, file_path: str, title: str = '', start_collapsed: bool = False, flag: str = '')

Bases: ResultsDisplayObject

An object for the GUI to display ascii tecxt files

It is best to not instantiate this class directly. Instead create it using create_results_display_object

This can be used for default display of files that have ascii encoded text but the formats are too variable to make a more complex ResultsDisplayFile

file_path

Path to the file

Type:

str

start_collapsed

Should the object start out collapsed when displayed in the GUI

Type:

bool

pipeliner.results_display_objects.get_next_resultsfile_name(dir: str, search_str: str) str

Get the name of the next results file

taking into account existing files of this type in the output dir to prevent overwriting existing ones

Parameters:
  • dir (str) – The output directory

  • search_str (str) – The full name for the file with * in place of the number

Returns:

The name of the file

Return type:

str

Deposition Objects

Lists of DepositionObject are returned by a PipelinerJob. The prepare_emdb_deposition_data and prepare_empiar_deposition_data functions are used to prepare automated depositions to the EMDB, and EMPIAR.

class pipeliner.deposition_tools.deposition_tools.DepositionObject(category: str = '', parent_job: str | None = None, alt_dict: Dict[str, Dict[str, Dict[str, str | Sequence[str]]]] | None = None)

Bases: object

This object contains data that will be used in an EMPIAR, EMDB, or PDB deposition

category

The category of deposition object. This should match the name of the dictionary key for the field in the database schema

Type:

str

parent_job

The name of the job that produced the data

Type:

Optional[str]

alt_dict

The dict that contains the information about the deposition objects for the database being deposited to if the EMDB schema are not being used. See pipeliner.deposition_tools.emdb_cats for the format

Type:

Dict[str, Dict[str, Dict[str, Union[str, List[str]]]]]

data

Contains the actual deposition data. Keys must exactly match the field names in the database schema

Type:

Dict[str, Optional[Union[str, int, float]]]

uuid

A uuid4 string that identifies the object

Type:

str

validate_depobj_data() List[str]

Check all the values in self.data are valid

They must satisfy the regex for that field They must be in the list of options if one was provided

Returns:

All the fields that failed validation

Return type:

List[str]

pipeliner.deposition_tools.deposition_tools.parse_onedep_cif(input_file: str, jobname: str | None = None) List[DepositionObject]

Parse a cif in the onedep pdb/emdb format and return deposition objects

If a depobj cannot be created, skip it and raise a warning

Parameters:
  • input_file (str) – The cif file to parse

  • jobname (Optional[str]) – The job that is reading the file

Returns:

A dep obj for each entry in the cif file

Return type:

Sequence[DepositionObject]

Functions that support these methods are:

EMPIAR DepositionObjects

class pipeliner.deposition_tools.empiar_deposition_objects.Micrograph(file: str, ext: str, n_frames: int, dimx: int, dimy: int, dtype: str, headtype: str, apix: float, voltage: float, spherical_aberration: float)

Bases: object

apix: float
dimx: int
dimy: int
dtype: str
ext: str
file: str
headtype: str
n_frames: int
spherical_aberration: float
voltage: float
pipeliner.deposition_tools.empiar_deposition_objects.get_imgfile_info(imgfile: str, blockname: str, img_block_col: str) Tuple[Dict[str, Tuple[float, float, float]], List[List[str]]]

Get information from the STAR file containing image info

Parameters:
  • imgfile (str) – The path to the image file

  • blockname (str) – Name of the images block in the STAR file

  • img_block_col (str) – The name of the column for the images in the image data block of the STAR file

Returns:

( Dict with info about the opitcs groups {og_number: (apix, voltage, sphere. ab)}, List of full paths (relative to the working dir) for all the images in the file, except in the case of movies then the path is relative to import dir)

Return type:

tuple

pipeliner.deposition_tools.empiar_deposition_objects.prepare_empiar_corrparts(in_file: str, job: str | None = None) List[DepositionObject]

Prepare the particles deposition objects for an empiar deposition

Parameters:
  • in_file (str) – Path to STAR file containing the particles

  • job (Optional[str]) – The job creating the object

Returns:

The DepositionObjects

Return type:

List[DepositionObject]

pipeliner.deposition_tools.empiar_deposition_objects.prepare_empiar_mics(in_file: str, job: str | None = None)
pipeliner.deposition_tools.empiar_deposition_objects.prepare_empiar_mics_parts_data(mpfile: str, is_parts: bool, is_cor_parts: bool, job: str | None = None) List[DepositionObject]

Prepare the micrographs or particles portion of an EMPIAR deposition

Parameters:
  • mpfile (str) – The name of the file containing the micrographs or particles

  • is_parts (bool) – Is the image set particles? will affect the info in the details

  • is_cor_parts (bool) – Is the image set corrected (polished particles)?

  • job (Optional[str]) – The job the particles/mics file came from

Returns:

A list of deposition objects

Return type:

list

pipeliner.deposition_tools.empiar_deposition_objects.prepare_empiar_parts(in_file: str, job: str | None = None) List[DepositionObject]

Prepare the particles deposition objects for an empiar deposition

Parameters:
  • in_file (str) – Path to STAR file containing the particles

  • job (Optional[str]) – The job creating the object

Returns:

The DepositionObjects

Return type:

List[DepositionObject]

pipeliner.deposition_tools.empiar_deposition_objects.prepare_empiar_raw_mics(movfile: str, job: str | None = None) List[DepositionObject]

Prepare the raw micrographs portion of an EMPIAR deposition

Parameters:
  • movfile (str) – Movies STAR file to operate on

  • job (Optional[str]) – The job the moves cam from

Returns:

A

DepositionObject used to create a deposition

Return type:

List[DepositionObject]

pipeliner.deposition_tools.empiar_deposition.get_citation_data(joboptions: Dict[str, str]) List[Dict[str, object]]

Gets the data for an EMPIAR citation

Parameters:

joboptions (Dict[str, str]) – The joboption keys and values, where all values are strings

Returns:

The citation data formatted for an

empiar deposition

Return type:

List[Dict[str, Union[str, Dict[str, str]]]

pipeliner.deposition_tools.empiar_deposition.get_deposition_objects_empiar(terminal_job: str, do_parts: bool = True, do_rparts: bool = True, do_movs: bool = True, do_mics: bool = True) List[DepositionObject]
pipeliner.deposition_tools.empiar_deposition.merge_empiar_dep_objs(in_depobjs: List[DepositionObject]) List[DepositionObject]

Merges together the list of DepositionObjects for an empiar job

For movies all deposition objects are kept. This is because multiple movie sets may be imported and combined. For corrected mics, particles, and corrected particles only the newest ones (the ones that contributed to the final job) are retained. This is to prevent duplications.

Parameters:

in_depobjs (List[DepositionObject]) – The depobjs from the chain of jobs

Returns:

The merged deposition objects

Return type:

List[DepositionObject]

pipeliner.deposition_tools.empiar_deposition.parse_empiar_jobstar(jobstar_file: str) Dict[str, object]

Parse a job.star file from an empiar deposition job and format if for deposition

Parameters:

jobstar_file (str) – Path to the file

Returns:

The data formatted for deposition

Return type:

Dict[str, Union[str, Dict[str, str]]]

pipeliner.deposition_tools.empiar_deposition.prepare_empiar_deposition(terminal_job: str, jobstar_file: str | None = None, do_parts: bool = True, do_rparts: bool = True, do_movs: bool = True, do_mics: bool = True) str

Prepare a deposition for empiar

Returns:

The name of the deposition file

Return type:

str

EMDB DepositionObjects

EMDB Deposition objects correspond to the schema here: http://ftp.ebi.ac.uk/pub/databases/emdb/doc/XML-schemas/emdb-schemas/v3/current_v3/doc/Untitled.html

pipeliner.deposition_tools.emdb_deposition_objects.make_em_software_depobj(the_prog: ExternalProgram, emdb_software_class, details: str = '', jobname: str = '') DepositionObject

Make a deposition object for a specific piece of software

This function generally doesn’t need to be explicitly called because the PipelinerJob will do it for each piece of software during creation of the deposition.

Parameters:
  • the_prog (ExternalProgram) – The program to get the depobj for

  • jobname (str) – The job that the depobj is being created for

  • details (str) – Details about what the software was doing

Returns:

The depobj for the program

Return type:

DepositionObject

pipeliner.deposition_tools.emdb_deposition.clean_dobj_cross_references(dobjs: List[DepositionObject]) List[DepositionObject]

Remove non em_image_processing deposition objects not reference by others

Args:

depobjs: (List[DepositionObject]): The DepositionObjects to operate on

Returns:

Contains the most recent of each em_image_porcessing

depobj and all others

Return type:

List[DepositionObject]

pipeliner.deposition_tools.emdb_deposition.get_deposition_objs_emdb(terminal_job: str) List[DepositionObject]

Prepare get the deposition objecs for each job in a work flow

Parameters:

terminal_job (str) – The job to use

Returns:

The gathered DepositionObject objects.

Return type:

List[DepositionObject]

Raises:

ValueError – If the terminal job is not found.

pipeliner.deposition_tools.emdb_deposition.get_most_recent_processing_depobjs(depobjs: List[DepositionObject]) List[DepositionObject]

Keep only the latest em_image_processing depobj from each type

The type is defined by the details parameter, which is the same as job description Non em_image_processing depobjs will be dealt with later using cross-reference checking

Parameters:

depobjs – (List[DepositionObject]): The DepositionObjects to operate on

Returns:

Contains the most recent of each em_image_porcessing

depobj and all others

Return type:

List[DepositionObject]

pipeliner.deposition_tools.emdb_deposition.prepare_emdb_deposition(terminal_job: str, emdb_id: str = '', outfile: str = '', verbose: bool = False, do_temp_ondep_update: bool = True) str

Prepare an emdb deposition cif file

go over the image_processing deposition objects, decide which are to be kept
  • This should be the one from the most recent job for each main processing categpry

if verbose, don’t throw away anything. This will lead to multiple image_processing

entries for intermediate reconstructions, classifications, and etc…

Throw away any other deposition objects that cross-reference obsolete ones, first doing image_processing, then references, and finally ref authors

pipeliner.deposition_tools.emdb_deposition.sort_depobjs_by_type(depobjs: List[DepositionObject]) Dict[str, List[DepositionObject]]

Sort a list of deposition objects by type

Parameters:

depobjs (List[DepositionObject]) – The deposition objects with all the ids updated including the EMDB ID

Returns:

A dict of types with a list of deposition

objects for each

Return type:

Dict[str, List[DepositionObject]]

pipeliner.deposition_tools.emdb_deposition.temp_EMDB_deposition_object_alterations(depobjs: List[DepositionObject]) List[DepositionObject]

Modify the list of deposition objects for current limitations of the OneDep system

  1. Add a single em_image_processing object, remove the others

  2. Update all other DepositionObjects to point their image_processing_id to the

    new one

  3. Change all specific relion program names to RELION

  4. Change ctffind4 program name to CTFFIND

These changes will be rolled back when the OneDep system is updated to accept multiple em_image_processing_id objects

Parameters:

depobjs (List[DepositionObject])

Returns:

The update deposition objects

Return type:

List[DepositionObject]

pipeliner.deposition_tools.emdb_deposition.update_and_format_deposition_objects(depobjs: List[DepositionObject]) List[DepositionObject]

Update the .id fields of Deposition objects change UUIDs to int ids

Parameters:

depobjs (List[DepositionObject]) – The deposition objects from the workflow

Returns:

The Deposition objects with the IDs updated

Return type:

List[DepositionObject]

pipeliner.deposition_tools.emdb_deposition.write_deposition_complete_cif(depobjs: List[DepositionObject], emdb_id: str, outfile: str) None

Write a cif file for the deposition

This function writes a complete cif file that is readable by gemmi, this is not the format needed by OneDep, which is writen by write_deposition_onedep_cif.

Parameters:
  • depobjs (List[DepositionObject]) – The deposition objects with all the ids updated including the EMDB ID

  • emdb_id – The Emdb deposition ID, provided by emdb

  • outfile (str) – Path to write the output file to

pipeliner.deposition_tools.emdb_deposition.write_deposition_onedep_cif(depobjs: List[DepositionObject], outfile: str) None

Writes a file in the reduced mmcif format required for deposition using OneDep

This format lacks headers

Parameters:
  • depobjs (List[DepositionObject]) – The deposition objects with all the ids updated including the EMDB ID

  • outfile (str) – Path to write the output file to

PDB deposition objects

PDB depositions are not currently supported