Pipeliner Jobs
Pipeliner jobs
- class pipeliner.pipeliner_job.ExternalProgram(command: str, name: str | None = None, vers_com: List[str] | None = None, vers_lines: List[int] | None = None)
Bases:
object
Class to store info about external programs called by the pipeliner
- class pipeliner.pipeliner_job.JobInfo(display_name: str = 'Pipeliner job', version: str = '0.0', job_author: str | None = None, short_desc: str = 'No short description for this job', long_desc: str = 'No long description for this job', documentation: str = 'No online documentation available', external_programs: List[ExternalProgram] | None = None, references: List[Ref] | None = None)
Bases:
object
Class for storing info about jobs.
This is used to generate documentation for the job within the pipeliner
- display_name
A user-friendly name to describe the job in a GUI, this should not include the software used, because that info is pulled from the job type
- Type:
- programs
A list of 3rd party software used by the job. These are used by the pipeliner to determine if the job can be run, so they need to include all executables the job might call. If any program on this list cannot be found with which then the job will be marked as unable to run.
- Type:
List[~pipeliner.pipeliner_job.ExternalProgram]
This can be set to True if other checks for the job to be available (besides programs missing from the $PATH) have failed, e.g. a necessary library is missing
- Type:
- property is_available
Is the job available to run?
True
if executables were found for all the job’s programs or ifforce_unavailable
has been set, orFalse
otherwise.
- class pipeliner.pipeliner_job.PipelinerCommand(args: Sequence[str | float | int], relion_control: bool = False)
Bases:
object
Holds a command that will be run by the pipeliner
- relion_control
Does the command need the relion ‘–pipeline_control’ argument appended before being run
- Type:
- class pipeliner.pipeliner_job.PipelinerJob
Bases:
object
Super-class for job objects.
Each job type has its own sub-class.
WARNING: do not instantiate this class directly, use the factory functions in this module.
- working_dir
The working directory to be used when running the job. This should normally be left as
None
, meaning the job will be run in the project directory. Jobs that write files in their working directory should instead work somewhere within the job’s output directory, and take care to adjust the paths of input and output files accordingly.- Type:
- OUT_DIR = ''
- PROCESS_NAME = ''
- add_compatibility_joboptions() None
Write additional joboptions for back compatibility
Some JobOptions are needed by the original program (hey Relion 4), but not the pipeliner, they are added here so the files pipeliner writes will be back compatible with their original program.
- add_output_node(file_name: str, node_type: str, keywords: List[str] | None = None) None
Helper function to add a new Node for a file in the job’s output directory.
This is a wrapper around
node_factory.create_node
which simply addsself.output_dir
to the start of the file name before creating the node and adding it toself.output_nodes
.- Parameters:
file_name – The name of the file that the new node will refer to. It is assumed that the file will be written to the job’a output directory. Note that the existence of the file is not checked, because this method will usually be called before the job has run.
node_type – The top-level type for the new node. This should almost always be one of the constants defined in
pipeliner.nodes
.keywords – A list of keywords to append to the node type.
- additional_joboption_validation() List[JobOptionValidationResult]
Advanced validation of job parameters
This is a placeholder function for additional validation to be done by individual job subtypes, such as comparing JobOption values IE: JobOption A must be > JobOption B
Avoid using self.get_string or self.get_number in this function as they may raise an error if the JobOption is required and has no value. Use self.joboptions[“jobopname”].value.
- Returns:
- A list
JobOptionValidationResult
objects
- A list
- Return type:
- check_joboption_is_now_deactivated(jo: str) bool
Check if a joboption has become deactivated in relation to others
For example if job option A is False, job option B is now deactiavted
- check_joboption_is_now_required(jo: str) list
Check if a joboption has become required in relation to others
For example if job option A is True, job option B is now required
- Parameters:
jo (str) – The name of the joboption to test
- Returns:
pipeliner.job_options.JobOptionValidationResult
:for any errors found
- Return type:
- create_input_nodes() None
Automatically add the job’s input nodes to its input node list.
Input nodes are created from each of the job’s job options.
- create_output_nodes() None
Make the job’s output nodes.
This method should be overridden by PipelinerJob subclasses.
The output nodes should be added to the list in the
output_nodes
attribute. Theadd_output_node
function is helpful to create and add a new node in a single call.If your job doesn’t make any output nodes, or doesn’t know what their names will be until the job has been run, you still need to override this method but your implementation can simply
pass
and do nothing. If you need to add output nodes at the end of the job, create them increate_post_run_output_nodes
.Note that this method is called by the job manager (via
PipelinerJob.prepare_to_run
) before the job is added to the pipeline. The job’s output directory does exist when this method is called, but that could change in future versions of the pipeliner and jobs should avoid making any file system changes in this method.
- create_post_run_output_nodes()
Placeholder function for post run node creation
Some jobs have output nodes that can only be created after the job has run because their names are not known until after they have been created. They can be added here. This function should ONLY add output nodes; any other work should be done in commands run by the job.
- create_results_display() Sequence[ResultsDisplayObject]
Create results display objects to be displayed by the GUI
This default implementation simply creates the default results display object for each of the job’s output nodes. Subclasses that want customised results should override this method.
- Returns:
A list of
ResultsDisplayObject
- gather_metadata() Dict[str, Any]
Placeholder function for metadata gathering
Each job class should define this individually
- Returns:
A placeholder “No metadata available” and the reason why
- Return type:
- get_additional_reference_info() List[Ref]
A placeholder function for job that need to return additional references
This if for references that are not included in self.job info, such as ones pulled from the EMDB/PDB in fetch jobs
- get_commands() List[PipelinerCommand]
Get the commands to be run for a specific job.
This method should be overridden by PipelinerJob subclasses.
Jobs are normally run with the project directory as the working directory. If your job needs to run in a different working directory (for example if it calls a program which always writes files into the current directory), set the
self.working_dir
attribute in this method.Note that this method should run quickly! Any long-running actions should be done in one of the job’s commands instead. (If necessary, put Python code that needs to be run into a separate script in
pipeliner.scripts.job_scripts
and then call it as a command.)- Returns:
The commands as a list of PipelinerCommand objects
- get_current_output_nodes() List[Node]
Get the current output nodes if the job was stopped prematurely
For most jobs there will not be any but for jobs with many iterations the most recent interation can be used if the job is aborted or failed and then later marked as successful
- get_default_params_dict() Dict[str, str]
Get a dict with the job’s parameters and default values
- Returns:
All the job’s parameters {parameter: default value}
- Return type:
- get_final_commands() List[List[str]]
Assemble the commands to be run for a job.
This function is intended to be called by the job runner just before the commands are run. Any setup required before the job starts should be done in
prepare_to_run()
.- Returns:
The commands, in a lists of lists format. Each item in the main list is a single command composed of a list of strings (as used by
subprocess.run
, i.e. [com, arg1, arg2, …])
- get_joboption_groups() Dict[str, List[str]]
Put the joboptions in groups according to their jobop_group attribute
Assumes that the joboptions have already been put in order of priority by self.set_joboption_order() or were in order to begin with.
Groups are ordered based on the highest priority joboption in that group from the order of the joboptions, except that “Main” is always the first group. Joboptions within the groups are ordered by priority.
- get_runtab_options(mpi: bool = False, threads: bool = False, addtl_args: bool = False, mpi_default_min: int = 1, mpi_must_be_odd: bool = False) None
Get the options found in the Run tab of the GUI, which are common to for all jobtypes
Adds entries to the joboptions dict for queueing, MPI, threading, and additional arguments. This method should be used when initialising a
PipelinerJob
subclass- Parameters:
mpi (bool) – Should MPI options be included?
threads (bool) – Should multi-threading options be included
addtl_args (bool) – Should and ‘additional arguments’ be added
mpi_default_min (int) – The minimum for the default number of MPIs, will be used if mpi_default_min > user defined min number of MPI
mpi_must_be_odd (bool) – Does the number of mpis have to be odd, like for relion refine_jobs.
- handle_doppio_uploads(dry_run=False) None
Tasks that have to be performed to deal with Doppio file uploads.
- Move files from DoppioUploads to the job dir:
DoppioUploads/tmpdir/file -> JobType/jobNNN/InputFiles/file
Update the job option values to point to the new file locations, so when the job input nodes are created they refer to the moved files
- Parameters:
dry_run – If True, do not actually try to move any files, just update the job option values. This option is only intended for use in testing.
- load_results_display_files() Sequence[ResultsDisplayObject]
Load the job’s results display objects from files on disk.
This method must be fast because it is used by the GUI to load job results. Therefore, if a display object fails to load properly, no attempt is made to recalculate it and a
ResultsDisplayPending
object is returned instead.If there are no results display files yet, an empty list is returned.
- Returns:
A list of
ResultsDisplayObject
- make_queue_options() None
Get options related to queueing and queue submission, which are common to for all jobtypes
- parse_additional_args() List[str]
Parse the additional arguments job option and return a list
- Returns:
- A list ready to append to the command. Quotated strings are preserved
as quoted strings all others are split into individual items
- Return type:
- prepare_clean_up_lists(do_harsh: bool = False) Tuple[List[str], List[str]]
Placeholder function for preparation of list of files to clean up
Each job class should define this individually
- prepare_deposition_data(depo_type: str) Sequence[EmpiarRefinedParticles | EmpiarParticles | EmpiarCorrectedMics | EmpiarMovieSet | OneDepData]
Placeholder for function to return deposition data objects
The specific list returned should be defined by each jobtype
- prepare_to_run(ignore_invalid_joboptions=False) None
Prepare the job to run.
This function is intended to be called by the pipeliner before the job file is saved to disk. It does several things including: - Validate the job options - Make the job directory - Move uploaded Doppio user files into the job directory
- Parameters:
ignore_invalid_joboptions (bool) – Prepare the job to run anyway even if the job options appear to be invalid
- Raises:
ValueError – If the job options appear to be invalid and
ignore_invalid_joboptions
is not setRuntimeError – If the job does not already have an output directory assigned
- read(filename: str) None
Reads parameters from a run.job or job.star file
- Parameters:
filename (str) – The file to read. Can be a run.job or job.star file
- Raises:
ValueError – If the file is a job.star file and job option from the
PipelinerJob
is missing from the input file
- save_job_submission_script(commands: list) str
Writes a submission script for jobs submitted to a queue
- Parameters:
commands (list) – The job’s commands. In a list of lists format
- Returns:
The name of the submission script that was written
- Return type:
- Raises:
ValueError – If no submission script template was specified in the job’s joboptions
ValueError – If the submission script template is not found
RuntimeError – If the output script could not be written
- save_results_display_files() Sequence[ResultsDisplayObject]
Create new results display objects and save them to disk.
This method removes any existing results display files first, and returns the new display objects after they have been created and saved.
- Returns:
The newly-created results display objects.
- set_joboption_order(new_order=typing.List[str]) None
Replace the joboptions dict with an ordered dict
Use this to set the order the joboptions will appear in the GUI. If a joboption is not specified in the list it will be tagged on to the end of the list.
- Parameters:
new_order (list[str]) – A list of joboption keys, in the order they should appear
- Raises:
ValueError – If a nonexistent joboption is specified
- set_option(line: str) None
Sets a value in the joboptions dict from a run.job file
- Parameters:
line (str) – A line from a run.job file
- Raises:
RuntimeError – If the line does not contain ‘==’
RuntimeError – If the value of the line does not match any of the joboptions keys
- validate_dynamically_required_joboptions() List[JobOptionValidationResult]
Check all joboptions if they have become required because of if_required
For example if job option A is True, job option B is now required
- Returns:
pipeliner.job_options.JobOptionValidationResult
:for any errors found
- Return type:
- validate_input_files() List[JobOptionValidationResult]
Check that files specified as inputs actually exist
- Returns:
- A list of
pipeliner.job_options.JobOptionValidationResult
objects
- A list of
- Return type:
- validate_joboptions() List[JobOptionValidationResult]
Make sure all the joboptions meet their validation criteria
- Returns:
- A list
JobOptionValidationResult
objects
- A list
- Return type:
- write_jobstar(output_dir: str, output_fn: str = 'job.star', is_continue: bool = False)
Write a job.star file.
- Parameters:
output_dir (str) – The output directory.
output_fn (str) – The name of the file to write. Defaults to job.star
is_continue (bool) – Is the file for a continuation of a previously run job? If so only the parameters that can be changed on continuation are written. Overrules is_continue attribute of the job
- class pipeliner.pipeliner_job.Ref(authors: str | List[str] | None = None, title: str = '', journal: str = '', year: str = '', volume: str = '', issue: str = '', pages: str = '', doi: str = '', **kwargs)
Bases:
object
Class to hold metadata about a citation or reference, typically a journal article.
Display tools
Use these methods to create ResultsDisplayObjects
used by the pipeliner GUI Doppio to create graphical
outputs for each job.
- pipeliner.display_tools.create_results_display_object(dobj_type: str, **kwargs) ResultsDisplayObject
Safely create a results display object
Returns a ResultsDisplayPending if there are any problems. Give it the type of display object as the first argument followed by the kwargs for that specific type of ResultsDisplayObject
- Parameters:
dobj_type (str) – The type of DisplayObject to create
- pipeliner.display_tools.get_ordered_classes_arrays(model_file: str, ncols: int, boxsize: int, output_dir: str, output_filename: str, parts_file: str | None = None, title: str = '2D class averages', start_collapsed: bool = False, flag: str = '') ResultsDisplayMontage | ResultsDisplayPending
Return a 3D array of class averages from a Relion Class2D model file
- Parameters:
model_file (str) – Name of the model file
ncols (int) – number of columns desired in the file montage
boxsize (int) – Size of the class averages in the final montage
output_dir (str) – The output dir of the pipeliner job creating this object
output_filename (str) – The name for the output montage file
parts_file (str) – Path of the file containing the particles, for counting
title (str) – A title for the DisplayObject
start_collapsed (bool) – Should the display start out collapsed when displayed in the GUI
flag (str) – If this display object contains scientificlly dubious results display this message
- Returns:
An object for the GUI to use to render the graph
- Return type:
- pipeliner.display_tools.graph_from_starfile_cols(title: str, starfile: str, block: str, ycols: list, xcols: list | None = None, xrange: list | None = None, yrange: list | None = None, data_series_labels: List[str] | None = None, xlabel: str = '', ylabel: str = '', assoc_data: List[str] | None = None, modes: List[str] | None = None, start_collapsed: bool = False, flag: str = '') ResultsDisplayGraph | ResultsDisplayPending
Automatically generate a ResultsDisplayGraph object from a starfile
Can use one or two columns and third column for labels if desired
- Parameters:
title (str) – The title of the final graph
starfile (str) – Path to the star file ot use
block (str) – The block to use in the starfile, use None for a starfile with only a single block
ycols (list) – Column label(s) from the star file to use for the y data series
xcols (list) – Column label(s) from the star file to use for the y data series if None a simple count from 1 will be used
xlabel (str) – Label for the x axis, if no x data are specified the label will ‘Count’, if x data are specified and the xlabel is None the x axis label will be the name of the starfile column used
xrange (list) – Range for x vlaues to be displayed, full range if None
yrange (list) – Range for y vlaues to be displayed, full range if None
data_series_labels (list) – Names for the data series
ylabel (str) – Label for the y axis, if None the y axis label will be the name of the starfile column used
assoc_data (list) – List of data file(s) associated with this graph
modes (list) – Controls the appearance of each data series, choose from ‘lines’, ‘markers’ ‘or lines+markers’
start_collapsed (bool) – Should the display start out collapsed when displayed in the GUI
flag (str) – If this display object contains scientificlly dubious results display this message
- Returns:
A ResultsDisplayGraph object for the created graph
- Return type:
- pipeliner.display_tools.histogram_from_starfile_col(title: str, starfile: str, block: str, data_col: str, xlabel: str = '', ylabel: str = 'Count', assoc_data: List[str] | None = None, start_collapsed: bool = False, flag: str = '') ResultsDisplayHistogram | ResultsDisplayPending
Automatically generate a ResultsDisplayHistogram object from a starfile
- Parameters:
title (str) – The title of the final graph
starfile (str) – Path to the star file ot use
block (str) – The block to use in the starfile, use None for a starfile with only a single block
data_col (str) – Column label from the star file to use for the data series
xlabel (str) – Label for the x axis, if no x data are specified the label will ‘Count’, if x data are specified and the xlabel is None the x axis label will be the name of the starfile column used
ylabel (str) – Label for the y axis, if None the y axis label will be the name of the starfile column used
assoc_data (list) – List of data file(s) associated with this graph
start_collapsed (bool) – Should the display start out collapsed when displayed in the GUI
flag (str) – If this display object contains scientificlly dubious results display this message
- pipeliner.display_tools.make_map_model_thumb_and_display(outputdir: str, maps: List[str] | None = None, maps_opacity: List[float] | None = None, maps_colours: List[str] | None = None, models: List[str] | None = None, models_colours: List[str] | None = None, title: str | None = None, maps_data: str = '', models_data: str = '', assoc_data: List | None = None, start_collapsed: bool = True, flag: str = '') ResultsDisplayMapModel | ResultsDisplayPending
Make a display object for an atomic model overlaid over a map
Makes a binned map and a ResultsDisplayMapModel display object
- Parameters:
outputdir (str) – Name of the job’s output directory
maps (list) – List of map files to use
models (list) – List of model files to use
maps_opacity (list) – List of opacity for the maps, from 0-1 if None 0.5 is used for all
maps_colours (list) – Colors for the maps of specific ones are desired, otherwise mol* will assign them
title (str) – The title for the ResultsDisplayMapModel object, if None the name of the map and model will be used
maps_data (str) – Any additional data to be included about the map
models_data (str) – Any additional data to be included about the map
models_colours (list) – Colors for the models of specific ones are desired, otherwise mol* will assign them
assoc_data (list) – List of associated data, if left as None then just uses the file itself
start_collapsed (bool) – Should the display start out collapsed when displayed in the GUI
flag (str) – If the results are considered scientifically dubious explain in this string
- Returns:
The DisplayObject for the map and model
- Return type:
- pipeliner.display_tools.make_particle_coords_thumb(in_mrc, in_coords, out_dir, thumb_size=640, pad=5, start_collapsed=False, title: str = 'Example picked particles', flag: str = '', markers: bool = False) ResultsDisplayImage | ResultsDisplayPending
Create a thumbnail of picked particle coords on their micrograph
Because the extraction box size is not known boxes will be a % of the total image size.
- Parameters:
in_mrc (str) – Path to the merged micrograph mrc file
in_coords (str) – Path to the .star coordinates file
out_dir (str) – Name of the output directory
thumb_size (int) – Size of the x dimension of the final thumbnail image
pad (int) – Thickness of the particle box borders before binning in px
start_collapsed (bool) – Should the display start out collapsed when displayed in the GUI
title (str) – What title to use for the displayobj created
flag (str) – If this display object contains scientificlly dubious results display this message
markers (bool) – Instead of making boxes make markers
- pipeliner.display_tools.mini_montage_from_many_files(filelist: List[str], outputdir: str, nimg: int = 5, montagesize: int = 640, title: str = '', ncols: int = 5, associated_data: List[str] | None = None, labels: List[str] | None = None, cmap: str = '', start_collapsed: bool = False, flag: str = '') ResultsDisplayMontage | ResultsDisplayPending
Make a mini montage from a list of images
Merge and flatten image stacks
- Parameters:
filelist (list) – A list of the files to use
outputdir (str) – The output dir of the pipeliner job
nimg (int) – Number of images to use in the montage
montagesize (int) – Desired size of the final montage image
title (str) – Title for the ResultsDisplay object that will be output
ncols (int) – Number of columns to make in the montage
associated_data (list) – Data files associated with these images, if None then all of the selected images
labels (list) – The labels for the items in the montage
cmap (str) – colormap to apply, if any
start_collapsed (bool) – Should the display start out collapsed when displayed in the GUI
flag (str) – If this display object contains scientificlly dubious results display this message
- Returns:
The DisplayObject for the map
- Return type:
- Raises:
ValueError – If a non mrc or tiff image is used
- pipeliner.display_tools.mini_montage_from_stack(stack_file: str, outputdir: str, nimg: int = 40, ncols: int = 10, montagesize: int = 640, title: str = '', labels: List[int | str] | None = None, cmap: str = '', start_collapsed: bool = False, flag: str = '') ResultsDisplayMontage | ResultsDisplayPending
Make a montage from a mrcs or tiff file
- Parameters:
stack_file (str) – The path to the stack_file
outputdir (str) – The output dir of the pipeliner job
nimg (int) – Number of images to use in the montage, if < 1 uses all of them
ncols (int) – Number of columns to use
montagesize (int) – Desired size of the final montage image
title (str) – Title for the ResultsDisplay object that will be output
labels (list) – Labels for the images
cmap (str) – colormap to apply, if any
start_collapsed (bool) – Should the display start out collapsed when displayed in the GUI
flag (str) – If this display object contains scientificlly dubious results display this message
- Returns:
The DisplayObject for the map
- Return type:
- Raises:
ValueError – If a non mrc or tiff image is used
- pipeliner.display_tools.mini_montage_from_starfile(starfile: str, block: str, column: str, outputdir: str, title: str = '', nimg: int = 20, montagesize: int = 640, ncols: int = 10, labels: List[str] | None = None, cmap: str = '', start_collapsed: bool = False, flag: str = '') ResultsDisplayMontage | ResultsDisplayPending
Make a montage from a list of images in a starfile column
Merge and flatten image stacks if they are encountered.
- Parameters:
starfile (str) – The starfile to use
block (str) – The name of the block with the images
column (str) – The name of the column that has the images
outputdir (str) – The output dir of the pipeliner job
title (str) – The title for the object, automatically generated if “”
nimg (int) – Number of images to use in the montage, uses all if < 1
montagesize (int) – Desired size of the final montage image
ncols (int) – number of columns to use
labels (list) – Labels for the images in the montage, in order
cmap (str) – colormap to apply, if any
start_collapsed (bool) – Should the display start out collapsed when displayed in the GUI
flag (str) – If this display object contains scientificlly dubious results display this message
- Returns:
The DisplayObject for the map
- Return type:
- Raises:
ValueError – If a non mrc or tiff image is encountered
ResultsDisplay Objects
These objects generally should not be instantiated directly they should instead be created using the functions above.
- class pipeliner.results_display_objects.ResultsDisplayGraph(*, xvalues: list, yvalues: list, title: str, associated_data: list, data_series_labels: list, xaxis_label: str = '', xrange: list | None = None, yaxis_label: str = '', yrange: list | None = None, modes: list | None = None, start_collapsed: bool = False, flag: str = '')
Bases:
ResultsDisplayObject
A simple graph for the GUI to display
It is best to not instantiate this class directly. Instead create it using create_results_display_object
- xvalues
(list): list of x coordinate data series, can have multiple data series
- xrange
Range of x to be displayed, displays the full range if None. If the x axis needs to be reveresd then enter the values backwards [max, min]
- Type:
- yvalues
(list): List y coordinate data series can have multiple data series
- yrange
Range of y to be displayed, displays the full range if None. If the y axis needs to be reveresd then enter the values backwards [max, min]
- Type:
- modes
Controls the appearance of each data series, choose from ‘lines’, ‘markers’ ‘or lines+markers’
- Type:
- class pipeliner.results_display_objects.ResultsDisplayHistogram(*, title: str, associated_data: list, data_to_bin: List[float] | None = None, xlabel: str = '', ylabel: str = '', bins: List[int] | None = None, bin_edges: List[float] | None = None, start_collapsed: bool = False, flag: str = '')
Bases:
ResultsDisplayObject
A class for the GUI to display a histogram
It is best to not instantiate this class directly. Instead, create it using create_results_display_object
- Parameters:
title (str) – The title of the histogram
data_to_bin (list) – The data to bin
xlabel (str) – Label for the x axis
ylabel (str) – Label for the y axis
associated_data (list) – List of data files associated with the histogram
bins (list) – A list of bin counts, if they are known
bin_edges (list) – A list of the bin edges, if they are already known
start_collapsed (bool) – Should the object start out collapsed when displayed in the GUI
- Raises:
ValueError – If no data or bins are specified
ValueError – If an attempt is made to specify bins or bin edges when data to bin are being provided
ValueError – If the associated data is not a list, or not provided
- class pipeliner.results_display_objects.ResultsDisplayHtml(*, title: str, associated_data: list, html_dir: str = '', html_file: str = '', html_str: str = '', start_collapsed: bool = False, flag: str = '')
Bases:
ResultsDisplayObject
An object for the GUI to display html
It is best to not instantiate this class directly. Instead create it using create_results_display_object
This can be used for general HTML display in Doppio. Either provide a directory with index.html or specify a html file or provide a html string as input.
- class pipeliner.results_display_objects.ResultsDisplayImage(*, title: str, image_path: str, image_desc: str, associated_data: list, start_collapsed: bool = False, flag: str = '')
Bases:
ResultsDisplayObject
A class for the GUI to display a single image
It is best to not instantiate this class directly. Instead create it using create_results_display_object
- class pipeliner.results_display_objects.ResultsDisplayJson(*, file_path: str, title: str = '', start_collapsed: bool = False, flag: str = '')
Bases:
ResultsDisplayObject
An object for the GUI to display JSON files
It is best to not instantiate this class directly. Instead create it using create_results_display_object
- class pipeliner.results_display_objects.ResultsDisplayMapModel(title: str, associated_data: list, maps: list | None = None, models: list | None = None, maps_data: str = '', models_data: str = '', maps_opacity: list | None = None, maps_colours: list | None = None, models_colours: list | None = None, start_collapsed: bool = True, flag: str = '')
Bases:
ResultsDisplayObject
An object for overlaying multiple maps and/or models
It is best to not instantiate this class directly. Instead create it using create_results_display_object
- maps_colours
Hex values for colouring the maps specific colours, in the form “#XXXXXX” where X is a hex digit (0-9 or a-f). If None, the standard colours will be used
- Type:
- models_colours
Hex values for colouring the models specific colours, in the form “#XXXXXX” where X is a hex digit (0-9 or a-f). If None, the standard colours will be used
- Type:
- Raises:
ValueError – If no maps or models were specified
ValueError – If the map is not .mrc format
ValueError – If models are not in pdb of mmcif format
ValueError – If the number of maps and map opacities don’t match
- class pipeliner.results_display_objects.ResultsDisplayMontage(*, xvalues: list, yvalues: list, img: str, title: str, associated_data: list, labels: list | None = None, start_collapsed: bool = False, flag: str = '')
Bases:
ResultsDisplayObject
An object to send to the GUI to make an image montage
This one is an image montage with info about the specific images It is best to not instantiate this class directly. Instead create it using create_results_display_object
- xvalues
(list): The x coordinates by image
- yvalues
(list): The y coordinates by image
- class pipeliner.results_display_objects.ResultsDisplayObject(title: str, start_collapsed: bool = False, flag='')
Bases:
object
Abstract super-class for results display objects
- flag
A message that is displayed if the results display object is showing somthing scientifically dubious.
- Type:
- write_displayobj_file(outdir) None
Write a json file from a ResultsDisplayObject object
- Parameters:
outdir (str) – The directory to write the output in
- Raises:
NotImplementedError – If a write attempt is made from the superclass
- class pipeliner.results_display_objects.ResultsDisplayPdfFile(*, file_path: str, title: str = '', start_collapsed: bool = False, flag: str = '')
Bases:
ResultsDisplayObject
An object for the GUI to display pdf files
It is best to not instantiate this class directly. Instead create it using create_results_display_object
- class pipeliner.results_display_objects.ResultsDisplayPending(*, title: str = 'Results pending...', message: str = 'The result not available yet', reason: str = 'unknown', start_collapsed: bool = False, flag: str = '')
Bases:
ResultsDisplayObject
A placeholder class for when a job is not able to produce results yet
- class pipeliner.results_display_objects.ResultsDisplayPlotlyHistogram(*, data: List[float] | DataFrame | ndarray | dict | None = None, title: str, x: str | list | None = None, y: str | list | None = None, color: str | int | list | None = None, nbins: int | None = None, range_x: list | None = None, range_y: list | None = None, category_orders: dict | None = None, labels: dict | None = None, bin_counts: List[float] | None = None, bin_centres: List[float] | None = None, associated_data: list, start_collapsed: bool = False, flag: str = '', **kwargs)
Bases:
ResultsDisplayObject
A class that generates plotly.graph_objects.Figure object to display a histogram Uses plotly express histogram https://plotly.com/python-api-reference/generated/plotly.express.histogram.html Examples here: https://plotly.com/python/histograms/
- data
The data to bin. Following types are allowed list - list of values to be binned array and dict - converted to a pandas dataframe internally pandas dataframe - ensure column names are added if ‘x’ indicates a column name More details https://plotly.com/python/px-arguments/
- plotlyfig
plotly.graph_objects.Figure object generated from input data
- class pipeliner.results_display_objects.ResultsDisplayPlotlyObj(*, data: list | DataFrame | ndarray | dict, plot_type: list | str, title: str, associated_data: list, multi_series: bool = False, subplot: bool = False, make_subplot_args: dict | None = None, subplot_order: str | List[tuple] | None = None, subplot_size: Sequence[int] | None = None, subplot_args: List[dict] | None = None, series_args: List[dict] | None = None, layout_args: dict | None = None, trace_args: dict | None = None, xaxes_args: List[dict] | dict | None = None, yaxes_args: List[dict] | dict | None = None, start_collapsed: bool = False, flag: str = '', **kwargs)
Bases:
ResultsDisplayObject
This uses the plotly express class to create plotly.graph_objects.Figure object https://plotly.com/python/plotly-express/ Use this class to generate plotly Figure objects for custom plots including facet-plots: https://plotly.com/python/facet-plots/ subplots: https://plotly.com/python/subplots/ multi_series: e.g. https://plotly.com/python/creating-and-updating-figures/#adding-traces
- data
The data to plot. For a single plot, following types are allowed: list - list of values to be binned array and dict - converted to a pandas dataframe internally pandas dataframe - ensure column names are added if ‘x’ indicates a column name More details https://plotly.com/python/px-arguments/ For subplots and/or multi_series: list - list with dictionary of arguments for each plot/series
- plot_type
Required, type of plot. For a single plot, it is the plotly express function to call https://plotly.com/python-api-reference/plotly.express.html For subplots and/or multi_series, plotly.graph_objects function to call https://plotly.com/python-api-reference/plotly.graph_objects.html
- Type:
- plotlyfig
plotly.graph_objects.Figure object generated from input data
- check_subplot_arguments(data, subplot_size, subplot_order, plot_type, subplot_args, xaxes_args, yaxes_args) None
- generate_multiseries_plots(plot_type, plot_args) Figure
- generate_subplots(subplot_size, plot_type, subplot_order, plot_args, make_subplot_args) Figure
- class pipeliner.results_display_objects.ResultsDisplayPlotlyScatter(*, data: List[List[float]] | DataFrame | ndarray | dict | None = None, title: str, x: str | List[float] | None = None, y: str | List[float] | None = None, color: str | int | Sequence[str] | None = None, size: str | int | Sequence[str] | None = None, symbol: str | int | Sequence[str] | None = None, hover_name: str | int | Sequence[str] | None = None, range_color: list | None = None, range_x: list | None = None, range_y: list | None = None, category_orders: dict | None = None, labels: dict | None = None, associated_data: list, start_collapsed: bool = False, flag: str = '', **kwargs)
Bases:
ResultsDisplayObject
A class that generates plotly.graph_objects.Figure object to display a scatter plot Uses plotly express scatter https://plotly.com/python-api-reference/generated/plotly.express.scatter.html Examples here: https://plotly.com/python/line-and-scatter/
- data
The data to bin. Following types are allowed list - list of values to be binned array and dict - converted to a pandas dataframe internally pandas dataframe - ensure column names are added if ‘x’ indicates a column name More details https://plotly.com/python/px-arguments/
- plotlyfig
plotly.graph_objects.Figure object generated from input data
- class pipeliner.results_display_objects.ResultsDisplayRvapi(*, title: str, rvapi_dir: str, start_collapsed: bool = False, flag: str = '')
Bases:
ResultsDisplayObject
An object for the GUI to display rvapi objects
It is best to not instantiate this class directly. Instead create it using create_results_display_object
This can be used for general HTML display in Doppio. Create a directory with index.html and it will be shown in the results display tab
- class pipeliner.results_display_objects.ResultsDisplayTable(*, title: str, headers: list, table_data: list, associated_data: list, header_tooltips: list | None = None, start_collapsed: bool = False, flag: str = '')
Bases:
ResultsDisplayObject
An object for the GUI to display a table
It is best to not instantiate this class directly. Instead create it using create_results_display_object
- class pipeliner.results_display_objects.ResultsDisplayText(*, title: str, display_data: str, associated_data: list, start_collapsed: bool = False, flag: str = '')
Bases:
ResultsDisplayObject
A class to display general text in the GUI results tab
It is best to not instantiate this class directly. Instead create it using create_results_display_object
- class pipeliner.results_display_objects.ResultsDisplayTextFile(*, file_path: str, title: str = '', start_collapsed: bool = False, flag: str = '')
Bases:
ResultsDisplayObject
An object for the GUI to display ascii tecxt files
It is best to not instantiate this class directly. Instead create it using create_results_display_object
This can be used for default display of files that have ascii encoded text but the formats are too variable to make a more complex ResultsDisplayFile
Deposition Objects
DepositionObjects are returned by a PipelinerJob
’s
prepare_deposition_data
function and are used to prepare
automated depositions to the PDB, EMDB, and EMPIAR.
EMPIAR DepositionObjects
- class pipeliner.deposition_tools.empiar_deposition_objects.EmpiarCorrectedMics(name: str = 'Corrected micrographs', directory: str | NoneType = None, category: str | NoneType = None, header_format: str | NoneType = None, data_format: str | NoneType = None, num_images_or_tilt_series: int | NoneType = None, frames_per_image: int | NoneType = None, voxel_type: str | NoneType = None, pixel_width: float | NoneType = None, pixel_height: float | NoneType = None, details: str | NoneType = None, image_width: int | NoneType = None, image_height: int | NoneType = None, micrographs_file_pattern: str | NoneType = None)
Bases:
object
- class pipeliner.deposition_tools.empiar_deposition_objects.EmpiarData(name: str = '', directory: str | NoneType = None, category: str | NoneType = None, header_format: str | NoneType = None, data_format: str | NoneType = None, num_images_or_tilt_series: int | NoneType = None, frames_per_image: int | NoneType = None, voxel_type: str | NoneType = None, pixel_width: float | NoneType = None, pixel_height: float | NoneType = None, details: str | NoneType = None, image_width: int | NoneType = None, image_height: int | NoneType = None, micrographs_file_pattern: str | NoneType = None, picked_particles_file_pattern: str | NoneType = None)
Bases:
object
- class pipeliner.deposition_tools.empiar_deposition_objects.EmpiarMovieSet(name: str = 'Multiframe micrograph movies', directory: str | NoneType = None, category: str | NoneType = None, header_format: str | NoneType = None, data_format: str | NoneType = None, num_images_or_tilt_series: int | NoneType = None, frames_per_image: int | NoneType = None, voxel_type: str | NoneType = None, pixel_width: float | NoneType = None, pixel_height: float | NoneType = None, details: str | NoneType = None, image_width: int | NoneType = None, image_height: int | NoneType = None, micrographs_file_pattern: str | NoneType = None)
Bases:
object
- class pipeliner.deposition_tools.empiar_deposition_objects.EmpiarParticles(name: str = 'Particle images', directory: str | NoneType = None, category: str | NoneType = None, header_format: str | NoneType = None, data_format: str | NoneType = None, num_images_or_tilt_series: int | NoneType = None, frames_per_image: int | NoneType = None, voxel_type: str | NoneType = None, pixel_width: float | NoneType = None, pixel_height: float | NoneType = None, details: str | NoneType = None, image_width: int | NoneType = None, image_height: int | NoneType = None, micrographs_file_pattern: str | NoneType = None, picked_particles_file_pattern: str | NoneType = None)
Bases:
object
- class pipeliner.deposition_tools.empiar_deposition_objects.EmpiarRefinedParticles(name: str = 'Per-particle motion corrected particle images', directory: str | NoneType = None, category: str | NoneType = None, header_format: str | NoneType = None, data_format: str | NoneType = None, num_images_or_tilt_series: int | NoneType = None, frames_per_image: int | NoneType = None, voxel_type: str | NoneType = None, pixel_width: float | NoneType = None, pixel_height: float | NoneType = None, details: str | NoneType = None, image_width: int | NoneType = None, image_height: int | NoneType = None, micrographs_file_pattern: str | NoneType = None, picked_particles_file_pattern: str | NoneType = None)
Bases:
object
- class pipeliner.deposition_tools.empiar_deposition_objects.Micrograph(file: str, ext: str, n_frames: int, dimx: int, dimy: int, dtype: str, headtype: str, apix: float, voltage: float, spherical_abberation: float)
Bases:
object
- pipeliner.deposition_tools.empiar_deposition_objects.empiar_check(emp_dep_obj: Any, attribute: str, number: int)
Check that an attribute in empiar format is valid
Checks values in the form (“T<n>, “”) for things like EMPIARs, header_format value
- pipeliner.deposition_tools.empiar_deposition_objects.get_imgfile_info(imgfile: str, blockname: str, img_block_col: str) Tuple[Dict[str, Tuple[float, float, float]], List[str]]
Get information from the starfile containing image info
- Parameters:
- Returns:
( Dict with info about the opitcs groups {og_number: (apix, voltage, sphere. ab)}, List of full paths (relative to the working dir) for all the images in the file, except in the case of movies then the path is relative to import dir)
- Return type:
- pipeliner.deposition_tools.empiar_deposition_objects.prepare_empiar_mics(in_file: str) List[EmpiarCorrectedMics]
- pipeliner.deposition_tools.empiar_deposition_objects.prepare_empiar_mics_parts_data(mpfile: str, is_parts: bool, is_cor_parts: bool) List[EmpiarData]
Prepare the micrographs or particles portion of an EMPIAR deposition
- pipeliner.deposition_tools.empiar_deposition_objects.prepare_empiar_parts(in_file: str, is_polished: bool = False) Sequence[EmpiarParticles | EmpiarRefinedParticles]
- pipeliner.deposition_tools.empiar_deposition_objects.prepare_empiar_raw_mics(movfile: str) List[EmpiarMovieSet]
Prepare the raw micrographs portion of an EMPIAR deposition
- Parameters:
movfile (str) – Movies star file to operate on
- Returns:
- A
DepositionObject used to create a deposition
- Return type:
List[
EmpiarMovieSet
]
PDB/EMDB DepositionObjects
PDB/EMDB Deposition object correspond to the schema here: http://ftp.ebi.ac.uk/pub/databases/emdb/doc/XML-schemas/emdb-schemas/v3/current_v3/doc/Untitled.html
- class pipeliner.deposition_tools.onedep_deposition_objects.Em2dProjectionSelection(id: int | List[str | NoneType] | str | NoneType = None, entry_id: str = 'ENTRY_ID', details: str | NoneType = None, method: str | NoneType = None, num_particles: int | NoneType = None, software_name: str | NoneType = None, citation_id: int | List[str | NoneType] | str | NoneType = None)
Bases:
OneDepData
- class pipeliner.deposition_tools.onedep_deposition_objects.Em3dReconstruction(id: int | List[str | NoneType] | str | NoneType = None, entry_id: str = 'ENTRY_ID', num_particles: int | NoneType = None, symmetry_type: str | NoneType = None, image_processing_id: int | List[str | NoneType] | str | NoneType = None, actual_pixel_size: float | NoneType = None, algorithm: str | NoneType = None, citation_id: int | List[str | NoneType] | str | NoneType = None, ctf_correction_method: str | NoneType = None, details: str | NoneType = None, euler_angles_details: str | NoneType = None, fsc_type: str | NoneType = None, magnification_calibration: str | NoneType = None, method: str | NoneType = None, nominal_pixel_size: float | NoneType = None, num_class_averages: int | NoneType = None, refinement_type: str | NoneType = None, resolution: float | NoneType = None, resolution_method: str | NoneType = None, software: str | NoneType = None)
Bases:
OneDepData
- class pipeliner.deposition_tools.onedep_deposition_objects.EmCtfCorrection(id: int | List[str | NoneType] | str | NoneType = None, image_processing_id: int | List[str | NoneType] | str | NoneType = None, type: Literal['PHASE FLIPPING AND AMPLITUDE CORRECTION'] = 'PHASE FLIPPING AND AMPLITUDE CORRECTION', details: str | NoneType = None)
Bases:
OneDepData
- class pipeliner.deposition_tools.onedep_deposition_objects.EmImageProcessing(id: int | List[str | NoneType] | str | NoneType = None, image_recording_id: int | List[str | NoneType] | str | NoneType = None, details: str | NoneType = None)
Bases:
OneDepData
- class pipeliner.deposition_tools.onedep_deposition_objects.EmImageRecording(id: int | List[str | NoneType] | str | NoneType = None, imaging_id: int | List[str | NoneType] | str | NoneType = None, avg_electron_dose_per_image: float | NoneType = None, average_exposure_time: float | NoneType = None, details: str | NoneType = None, detector_mode: str | NoneType = None, film_or_detector_model: str | NoneType = None, num_diffraction_images: int | NoneType = None, num_grids_imaged: int | NoneType = None, num_real_images: int | NoneType = None)
Bases:
OneDepData
- class pipeliner.deposition_tools.onedep_deposition_objects.EmImaging(id: int | List[str | NoneType] | str | NoneType = None, entry_id: str = 'ENTRY_ID', accelerating_voltage: int | NoneType = None, illumination_mode: str | NoneType = None, electron_source: str | NoneType = None, microscope_model: str | NoneType = None, imaging_mode: str | NoneType = None, sample_support_id: int | List[str | NoneType] | str | NoneType = None, specimen_holder_type: str | NoneType = None, specimen_holder_model: str | NoneType = None, details: str | NoneType = None, date: str | NoneType = None, mode: str | NoneType = None, nominal_cs: float | NoneType = None, nominal_defocus_min: float | NoneType = None, nominal_defocus_max: float | NoneType = None, tilt_angle_min: float | NoneType = None, tilt_angle_max: float | NoneType = None, nominal_magnification: float | NoneType = None, calibrated_magnification: float | NoneType = None, energy_filter: str | NoneType = None, energy_window: str | NoneType = None, temperature: float | NoneType = None, detector_distance: float | NoneType = None, recording_temperature_minimum: float | NoneType = None, recording_temperature_maximum: float | NoneType = None)
Bases:
OneDepData
- class pipeliner.deposition_tools.onedep_deposition_objects.EmSampleSupport(id: int | List[str | NoneType] | str | NoneType = None, entry_id: str = 'ENTRY_ID', film_material: str | NoneType = None, grid_type: str | NoneType = None, grid_material: str | NoneType = None, grid_mesh_size: str | NoneType = None, details: str | NoneType = None)
Bases:
OneDepData
- class pipeliner.deposition_tools.onedep_deposition_objects.EmSoftware(id: int | List[str | NoneType] | str | NoneType = None, category: Literal['CLASSIFICATION', 'CTF CORRECTION', 'DIFFRACTION INDEXING', 'FINAL EULER ASSIGNMENT', 'IMAGE ACQUISITION', 'INITIAL EULER ASSIGNMENT', 'LAYERLINE INDEXING', 'MASKING', 'MODEL FITTING', 'MODEL REFINEMENT', 'OTHER', 'PARTICLE SELECTION', 'RECONSTRUCTION'] | NoneType = None, details: str | NoneType = None, name: str | NoneType = None, version: str | NoneType = None, image_processing_id: int | List[str | NoneType] | str | NoneType = None, fitting_id: int | List[str | NoneType] | str | NoneType = None, imaging_id: int | List[str | NoneType] | str | NoneType = None)
Bases:
OneDepData
- class pipeliner.deposition_tools.onedep_deposition_objects.EmSpecimen(id: int | List[str | NoneType] | str | NoneType = None, experiment_id: int = 1, concentration: str | NoneType = None, details: str | NoneType = None, embedding_applied: bool = False, shadowing_applied: bool = False, staining_applied: bool = False, vitrification_applied: bool = True)
Bases:
OneDepData
- class pipeliner.deposition_tools.onedep_deposition_objects.OneDepData(id: int | List[str | NoneType] | str | NoneType = None)
Bases:
object
- pipeliner.deposition_tools.onedep_deposition_objects.ddc
alias of
Em3dReconstruction
Deposition tools
These functions combine the information from the
DepositionObjects returned by a PipelinerJob
into a
format for automated deposition.
- class pipeliner.deposition_tools.onedep_deposition.DepositionData(field_name: str, data_items: object, merge_strat: str, reverse: bool = False)
Bases:
object
An object that holds data to be depositied via the onedep system
- field_name
The name of the field in the ped/emdb these should be drawn from https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Groups/index.html
- Type:
- data_items
A dataclass from pipeliner.deposition_tools.onedep_deposition_objects containing the data
- Type:
- dc_type
The subclass of the dataclass used for data_items
- Type:
- merge_strategy
(“multiple”, “combine”, or “overwrite”) How multiple display objects will be combined, Multiple: means each DepositionData object will be given its own entry in a loop in the cif file. Overwrite: Only the first/last (depending on self.reverse) will be used. Combine: the objects will be combined, with each field being overwritten by subsequent data in a given field if it is not None.
- Type:
- reverse
What order should the objects be considered in when combined False: The latest items take precedence, True: The earliest items take precedence
- Type:
- add_to_cif(block: Block, did: str)
Add a DepositionData object to a deposition cif file
- Parameters:
block (gemmi.cif.Block) – The block of the cif.Document that is being written
did (str) – The id of the deposition that is being written
- Raises:
ValueError – If the cif type is not in (“loop”, “pairs”)
- static cifformat(val: int | str | float | bool | None, depoid: str)
Format data for writing to cif files in gemmi
All data must be strings Anything with spaces gets single quotated None is replaced with “?”
- class pipeliner.deposition_tools.onedep_deposition.OneDepDeposition(terminal_job)
Bases:
object
Object for making a onedep deposition
Broken down in submethods for testing
- merged_depobjs
The raw_repobjs list, which each subgroup merged according to the merge strategy of that type of deposition object
- Type:
- final_depobjs
The merged depobjs with their UIDs and job references updated to their actual values
- Type:
- int_ids
The integer ids that correspond to the UIDs assigned to each of the raw deposition objects
- Type:
- make_int_ids()
Make a dictionary {UID: integer ID}
- merge_depobjs()
Merge deposition objects according to their merge strategy
- prepare_deposition()
Do all the steprs to get ready for a deposition
- update_jobrefs()
Update job references from ‘JOBREF: jobname’ to the UID for that job’
- update_uids_to_int_ids()
Update all UIDS in deposition objects to the integer ids
- write_deposition_cif_file(depo_id: str | None)
Write a cif file for deposition through the onedep system
- Parameters:
depo_id – The id of the deposition, assigned by EMDB/PDB
- Raises:
ValueError – If the deposition has not been prepared to write yet
- pipeliner.deposition_tools.onedep_deposition.gather_onedep_jobs_and_depobjects(terminal_job: str) Tuple[List[List[DepositionData]], Dict[str, List[DepositionData]]]
Prepare a onedep deposition
- Parameters:
terminal_job (str) – The job to use
- Returns:
- The gathered DepositionData objects. Each sublist
contains objects of one type.
- Return type:
List[List[DepositionData]]
- Raises:
ValueError – If the terminal job is not found.
- pipeliner.deposition_tools.onedep_deposition.make_deposition_data_object(data_obj, uid: int | List[str | None] | str | None = '') DepositionData
Makes a new DepositionData object of the desired type
- Parameters:
- Returns:
The created DepositionData object
- Return type:
- Raises:
ValueError – If the data dict is not appropriate for the selected dataclass
- pipeliner.deposition_tools.onedep_deposition.merge_depdata_items(new_di: DepositionData, old_di: List[DepositionData]) List[DepositionData]
Merge together DepositionData objects, using their merge strategy
- Parameters:
new_di (DepositionData) – The object to be merged in
old_di (List[DepositionData]) – Object(s) to be merged into. For DepositionData objects with the ‘multiple’ merge strategy this will be multiple objects for all others it will be a single object in the list
- Returns:
- The merged object(s) a single item for “combine” and
”overwrite” merge strategies and multiple items for “multiple”
- Return type:
List[DepositionData]
- Raises:
ValueError – If the objects have different merge strategies