Pipeliner Jobs
Pipeliner jobs
- class pipeliner.pipeliner_job.ExternalProgram(command: str, name: str | None = None, vers_com: List[str] | None = None, vers_lines: List[int] | None = None, emdb_categories: List[str] | None = None)
Bases:
objectClass to store info about external programs called by the pipeliner
- class pipeliner.pipeliner_job.JobInfo(display_name: str = 'Pipeliner job', version: str = '0.0', job_author: str | None = None, short_desc: str = 'No short description for this job', long_desc: str = 'No long description for this job', documentation: str = 'No online documentation available', external_programs: List[ExternalProgram] | None = None, references: List[Ref] | None = None)
Bases:
objectClass for storing info about jobs.
This is used to generate documentation for the job within the pipeliner
- display_name
A user-friendly name to describe the job in a GUI, this should not include the software used, because that info is pulled from the job type
- Type:
- programs
A list of 3rd party software used by the job. These are used by the pipeliner to determine if the job can be run, so they need to include all executables the job might call. If any program on this list cannot be found with which then the job will be marked as unable to run.
- Type:
List[ExternalProgram]
This can be set to a string explaining why the job is unavailable if other checks for the job to be available (besides programs missing from the $PATH) have failed, e.g. a necessary library is missing.
- Type:
- property is_available
Is the job available to run?
Trueif executables were found for all the job’s programs or ifalternative_unavailable_reasonhas been set, orFalseotherwise.
Gives the reason the pipeliner has marked the job as unavailable
- Returns:
The reason the job is marked unavailable, or
Noneif it is available- Return type:
Optional[str]
- class pipeliner.pipeliner_job.PipelinerCommand(cmd: Sequence[str | float | int], *, relion_control: bool = False, allow_failure: bool = False)
Bases:
objectHolds a command that will be run by the pipeliner
- relion_control
Does the command need the relion ‘–pipeline_control’ argument appended before being run
- Type:
- allow_failure
If
True, then the job will be allowed to continue even if this command fails. Be cautious if you use this option! You will need to consider all other places where the output of this command might be used and make the code suitably robust to handle possible missing files. In general, you should not set this option for any command that is required to produce an output node that might be used as input by another job.- Type:
- get_final_command(output_dir: str) List[str]
Get the final command to run from this PipelinerCommand object.
This makes a number of changes to the original command in self.cmd:
All args are converted to strings.
- If the command is “python” or “python3”, the command name is replaced with
the correct command to run the current Python executable.
- The command name is replaced by the path from
shutil.whichif one is found, otherwise it is left as a simple name.
- The command name is replaced by the path from
- If
self.relion_controlis set, the Relion pipeline control arguments are appended to the command.
- If
- If a
$sign is found in an argument, any environment variables in that argument are expanded with
os.path.expandvars.
- If a
- Parameters:
output_dir – The output dir for the job that runs this command. This value is only used if
self.relion_controlis set.- Returns:
The final command and its arguments, in a list ready to pass to
subprocess.run.
- class pipeliner.pipeliner_job.PipelinerJob
Bases:
objectSuper-class for job objects.
Each job type has its own sub-class.
WARNING: do not instantiate this class directly, use the factory functions in this module.
- working_dir
The working directory to be used when running the job. This should normally be left as
None, meaning the job will be run in the project directory. Jobs that write files in their working directory should instead work somewhere within the job’s output directory, and take care to adjust the paths of input and output files accordingly.- Type:
- CATEGORY_LABEL = ''
- OUT_DIR = ''
- PROCESS_NAME = ''
- add_compatibility_joboptions() None
Write additional joboptions for back compatibility
Some JobOptions are needed by the original program (hey Relion 4), but not the pipeliner, they are added here so the files pipeliner writes will be back compatible with their original program.
- add_emdb_deposition_data(category: str, data_dict: Dict[str, str]) DepositionObject
Create a deposition object specific to this job
- add_main_emdb_processing_data() DepositionObject
Make an em_image_processing DepositionObject for a job
This is the base DepositionObject for all jobs that generate results. Most jobs should generate this along with the associated more specific deposition objects.
This deposition object should ALWAYS be created last, as it will update the cross-references in the associated depobjs when it is created.
- Returns:
The em_image_processing DepositionObject
- Return type:
- add_onedep_metadata_import()
- add_output_node(file_name: str, node_type: str, keywords: List[str] | None = None) None
Helper function to add a new Node for a file in the job’s output directory.
This is a wrapper around
node_factory.create_nodewhich simply addsself.output_dirto the start of the file name before creating the node and adding it toself.output_nodes.- Parameters:
file_name – The name of the file that the new node will refer to. It is assumed that the file will be written to the job’a output directory. Note that the existence of the file is not checked, because this method will usually be called before the job has run.
node_type – The top-level type for the new node. This should almost always be one of the constants defined in
pipeliner.nodes.keywords – A list of keywords to append to the node type.
- additional_joboption_validation() List[JobOptionValidationResult]
Advanced validation of job parameters
This is a placeholder function for additional validation to be done by individual job subtypes, such as comparing JobOption values IE: JobOption A must be > JobOption B
Avoid using self.get_string or self.get_number in this function as they may raise an error if the JobOption is required and has no value. Use self.joboptions[“jobopname”].value.
- Returns:
- A list
JobOptionValidationResult objects
- A list
- Return type:
- check_joboption_is_now_deactivated(jo: str) bool
Check if a joboption has become deactivated in relation to others
For example if job option A is False, job option B is now deactiavted
- check_joboption_is_now_required(jo: str) list
Check if a joboption has become required in relation to others
For example if job option A is True, job option B is now required
- Parameters:
jo (str) – The name of the joboption to test
- Returns:
pipeliner.job_options.JobOptionValidationResult:for any errors found
- Return type:
- continuation_joboption_updates() None
Modifications to existing JobOptions for when a job is continued
Set a JobOption’s value for the continuation job.
This function can also do things like change the hard or suggested min/max, limit or expand multiple choice options, or make the regex validation in a StringJobOption more restrictive.
- create_input_nodes(existing_nodes: Dict[str, Node] | None = None) None
Automatically add the job’s input nodes to its input node list.
Input nodes are created from each of the job’s job options.
- Parameters:
existing_nodes – A dict of {node_name: Node object} for the current nodes in the pipeline, if available. This allows jobs to re-use existing nodes rather than creating new ones, which can be necessary if the type of the node is uncertain.
- create_output_nodes() None
Make the job’s output nodes.
This method should be overridden by PipelinerJob subclasses.
The output nodes should be added to the list in the
output_nodesattribute. Theadd_output_nodefunction is helpful to create and add a new node in a single call.If your job doesn’t make any output nodes, or doesn’t know what their names will be until the job has been run, you still need to override this method but your implementation can simply
passand do nothing. If you need to add output nodes at the end of the job, create them increate_post_run_output_nodes.Note that this method is called by the job manager (via
PipelinerJob.prepare_to_run) before the job is added to the pipeline. The job’s output directory does exist when this method is called, but that could change in future versions of the pipeliner and jobs should avoid making any file system changes in this method.
- create_post_run_output_nodes()
Placeholder function for post run node creation
Some jobs have output nodes that can only be created after the job has run because their names are not known until after they have been created. They can be added here. This function should ONLY add output nodes; any other work should be done in commands run by the job.
- create_results_display() Sequence[ResultsDisplayObject]
Create results display objects to be displayed by the GUI
This default implementation simply creates the default results display object for each of the job’s output nodes. Subclasses that want customised results should override this method.
- Returns:
A list of
ResultsDisplayObject
- evaluate_qsub_template(sub_text: str) None
Check that the qsub template is appropriate for the pipeliner
Looks for the common differences between relion style sub script and pipeliner style. Issues warnings if suspected problems are found but still allows it to run.
- Parameters:
sub_text (str) – The text of the submission script
- gather_metadata() Dict[str, int | float | str | bool | dict | list | None]
Placeholder function for metadata gathering
Each job class should define this individually
- Returns:
A placeholder “No metadata available” and the reason why
- Return type:
- get_additional_reference_info() List[Ref]
A placeholder function for job that need to return additional references
This if for references that are not included in self.jobinfo, such as ones pulled from the EMDB/PDB in fetch jobs
- get_category_label() str
Get a label for the category that this job belongs to.
If the job defines a
CATEGORY_LABELattribute, its value is simply returned. Otherwise, the second part of the process name is processed to produce a label by replacing underscores with spaces and converting to title case.
- get_commands() List[PipelinerCommand]
Get the commands to be run for a specific job.
This method should be overridden by PipelinerJob subclasses.
Jobs are normally run with the project directory as the working directory. If your job needs to run in a different working directory (for example if it calls a program which always writes files into the current directory), set the
self.working_dirattribute in this method.Note that this method should run quickly! Any long-running actions should be done in one of the job’s commands instead. (If necessary, put Python code that needs to be run into a separate script in
pipeliner.scripts.job_scriptsand then call it as a command.)- Returns:
The commands as a list of PipelinerCommand objects
- get_current_output_nodes() List[Node]
Get the current output nodes if the job was stopped prematurely
For most jobs there will not be any but for jobs with many iterations the most recent interation can be used if the job is aborted or failed and then later marked as successful
- get_default_params_dict() Dict[str, str]
Get a dict with the job’s parameters and default values
- Returns:
All the job’s parameters {parameter: default value}
- Return type:
- get_job_subdirs()
Get all the subdirectories contained in the jobdir excluding the NodeDisplay
returns List[str]: The dirs
- get_joboption_groups() Dict[str, List[str]]
Put the joboptions in groups according to their jobop_group attribute
Assumes that the joboptions have already been put in order of priority by self.set_joboption_order() or were in order to begin with.
Groups are ordered based on the highest priority joboption in that group from the order of the joboptions, except that “Main” is always the first group. Joboptions within the groups are ordered by priority.
- get_runtab_options(mpi: bool = False, threads: bool = False, addtl_args: bool = False, mpi_default_min: int = 1, mpi_must_be_odd: bool = False) None
Get the options found in the Run tab of the GUI, which are common to for all jobtypes
Adds entries to the joboptions dict for queueing, MPI, threading, and additional arguments. This method should be used when initialising a
PipelinerJobsubclass- Parameters:
mpi (bool) – Should MPI options be included?
threads (bool) – Should multi-threading options be included
addtl_args (bool) – Should and ‘additional arguments’ be added
mpi_default_min (int) – The minimum for the default number of MPIs, will be used if mpi_default_min > user defined min number of MPI
mpi_must_be_odd (bool) – Does the number of mpis have to be odd, like for relion refine_jobs.
- handle_doppio_uploads(dry_run=False) None
Tasks that have to be performed to deal with Doppio file uploads.
- Move files from DoppioUploads to the job dir:
DoppioUploads/tmpdir/file -> JobType/jobNNN/InputFiles/file
Update the job option values to point to the new file locations, so when the job input nodes are created they refer to the moved files
- Parameters:
dry_run – If True, do not actually try to move any files, just update the job option values. This option is only intended for use in testing.
- load_results_display_files() Sequence[ResultsDisplayObject]
Load the job’s results display objects from files on disk.
This method must be fast because it is used by the GUI to load job results. Therefore, if a display object fails to load properly, no attempt is made to recalculate it and a
ResultsDisplayPendingobject is returned instead.If there are no results display files yet, an empty list is returned.
- Returns:
A list of
ResultsDisplayObject
- make_final_emdb_deposition_objects() List[DepositionObject]
Get the deposition objects for a job
Gets the depobjs generated by the job’s specific methods and adds the general ones for its reference(s) and software
- make_queue_options() None
Get options related to queueing and queue submission, which are common to for all jobtypes
- parse_additional_args() List[str]
Parse the additional arguments job option and return a list
- Returns:
- A list ready to append to the command. Quotated strings are preserved
as quoted strings all others are split into individual items
- Return type:
- prepare_clean_up_lists(do_harsh: bool = False) Tuple[List[str], List[str]]
Placeholder function for preparation of list of files to clean up
Each job class should define this individually
- prepare_emdb_deposition_data() List[DepositionObject]
This function makes the deposition objects specific to this job
This is the function that should be edited for each specific PipelinerJob The main em_image_processing, citations, and citation_author objects will be created automatically, and don’t need to be included in this function.
- Returns:
- All the deposition objects, except software and
citations
- Return type:
List[DepositionObject]
- prepare_empiar_deposition_data() List[DepositionObject]
This function makes the deposition objects aside from citations and software
This is the function that should be editied for each specific PipelinerJob
- Returns:
- All the deposition objects, except software and
citations
- Return type:
List[DepositionObject]
- prepare_for_continuation() None
Prepares this job to be continued
This method should be run on an existing job, i.e. one that has a matching Process in the ProjectGraph. This means self.output_dir, self.input_nodes, and self.output_nodes should be assigned.
Updates the parameters and values of each JobOption using continuation_joboption_updates()
- Raises:
NotImplementedError – If the job cannot be continued
- prepare_to_run(ignore_invalid_joboptions: bool = False, existing_nodes: Dict[str, Node] | None = None) None
Prepare the job to run.
This function is intended to be called by the pipeliner before the job file is saved to disk. It does several things including: - Validate the job options - Make the job directory - Move uploaded Doppio user files into the job directory
- Parameters:
ignore_invalid_joboptions (bool) – Prepare the job to run anyway even if the job options appear to be invalid
existing_nodes – A dict of {node_name: Node object} for the current nodes in the pipeline, if available. This allows jobs to re-use existing nodes rather than creating new ones, which can be necessary if the type of the node is uncertain.
- Raises:
ValueError – If the job options appear to be invalid and
ignore_invalid_joboptionsis not setRuntimeError – If the job does not already have an output directory assigned
- read(filename: str) None
Reads parameters from a run.job or job.star file
- Parameters:
filename (str) – The file to read. Can be a run.job or job.star file
- Raises:
ValueError – If the file is a job.star file and job option from the
PipelinerJobis missing from the input file
- relion_joboption_conversion() None
Any updates that need to be made to the JobOptions if it was converted from relion
This function should modify job options and not return anything. It may need to be updated at a later time to accept a list of raw options read from the file if it is needed to convert jobs where relion has JobOptions that are not present in the pipeliner version of the job
- save_job_submission_script(commands: list) str
Writes a submission script for jobs submitted to a queue
- Parameters:
commands (list) – The commands to save, in a list of lists format. In Relion this would be the actual job commands, but in the pipeliner it will just be a single command to run the job with the
job_runner.- Returns:
The name of the submission script that was written
- Return type:
- Raises:
ValueError – If no submission script template was specified in the job’s joboptions
ValueError – If the submission script template is not found
RuntimeError – If the output script could not be written
- save_results_display_files() Sequence[ResultsDisplayObject]
Create new results display objects and save them to disk.
This method removes any existing results display files first, and returns the new display objects after they have been created and saved.
- Returns:
The newly-created results display objects.
- set_joboption_order(new_order: List[str]) None
Replace the joboptions dict with an ordered dict
Use this to set the order the joboptions will appear in the GUI. If a joboption is not specified in the list it will be tagged on to the end of the list.
- Parameters:
new_order (list[str]) – A list of joboption keys, in the order they should appear
- Raises:
ValueError – If a nonexistent joboption is specified
- set_option(line: str) None
Sets a value in the joboptions dict from a run.job file
- Parameters:
line (str) – A line from a run.job file
- Raises:
RuntimeError – If the line does not contain ‘==’
RuntimeError – If the value of the line does not match any of the joboptions keys
- validate_dynamically_required_joboptions() List[JobOptionValidationResult]
Check all joboptions if they have become required because of if_required
For example if job option A is True, job option B is now required
- Returns:
pipeliner.job_options.JobOptionValidationResult:for any errors found
- Return type:
- validate_input_files() List[JobOptionValidationResult]
Check that files specified as inputs actually exist
- Returns:
- A list of
pipeliner.job_options.JobOptionValidationResult objects
- A list of
- Return type:
- validate_joboptions() List[JobOptionValidationResult]
Make sure all the joboptions meet their validation criteria
- Returns:
- A list
JobOptionValidationResult objects
- A list
- Return type:
- validate_runtab_joboptions() List[JobOptionValidationResult]
- write_jobstar(output_dir: str, output_fn: str = 'job.star', is_continue: bool = False)
Write a job.star file.
- Parameters:
output_dir (str) – The output directory.
output_fn (str) – The name of the file to write. Defaults to job.star
is_continue (bool) – Is the file for a continuation of a previously run job? If so only the parameters that can be changed on continuation are written. Overrules is_continue attribute of the job
- class pipeliner.pipeliner_job.Ref(authors: str | List[str] | None = None, title: str = '', journal: str = '', year: str = '', volume: str = '', issue: str = '', pages: str = '', doi: str = '', **kwargs)
Bases:
objectClass to hold metadata about a citation or reference, typically a journal article.
Display tools
Use these methods to create ResultsDisplayObject
used by the pipeliner GUI Doppio to create graphical
outputs for each job.
- pipeliner.display_tools.create_results_display_object(dobj_type: str, **kwargs) ResultsDisplayObject
Safely create a results display object
Returns a ResultsDisplayPending if there are any problems. Give it the type of display object as the first argument followed by the kwargs for that specific type of ResultsDisplayObject
- Parameters:
dobj_type (str) – The type of DisplayObject to create
- pipeliner.display_tools.get_ordered_classes_arrays(model_file: str, ncols: int, boxsize: int, output_dir: str, output_filename: str, parts_file: str | None = None, title: str = '2D class averages', start_collapsed: bool = False, flag: str = '', base64_output: bool = False, optimiser_info: dict | None = None) ResultsDisplayObject
Return a 3D array of class averages from a Relion Class2D model file
- Parameters:
model_file (str) – Name of the model file
ncols (int) – number of columns desired in the file montage
boxsize (int) – Size of the class averages in the final montage
output_dir (str) – The output dir of the pipeliner job creating this object
output_filename (str) – The name for the output montage file
parts_file (str) – Path of the file containing the particles, for counting
title (str) – A title for the DisplayObject
start_collapsed (bool) – Should the display start out collapsed when displayed in the GUI
flag (str) – If this display object contains scientificlly dubious results display this message
base64_output (bool) – flag for a JSON file output with a list of objects holding base64 images and class IDs
optimiser_info (dict) – {name: str, type: str} Specify an optimiser data node to be included in the associated nodes of a gallery results display object.
- Returns:
An object for the GUI to use to render the graph
- Return type:
- pipeliner.display_tools.graph_from_starfile_cols(title: str, starfile: str, block: str, ycols: list, xcols: list | None = None, xrange: list | None = None, yrange: list | None = None, data_series_labels: List[str] | None = None, xlabel: str = '', ylabel: str = '', assoc_data: List[str] | None = None, modes: List[str] | None = None, start_collapsed: bool = False, flag: str = '') ResultsDisplayGraph | ResultsDisplayPending
Automatically generate a ResultsDisplayGraph object from a STAR file
Can use one or two columns and third column for labels if desired
- Parameters:
title (str) – The title of the final graph
starfile (str) – Path to the STAR file to use
block (str) – The block to use in the STAR file, use None for a STAR file with only a single block
ycols (list) – Column label(s) from the STAR file to use for the y data series
xcols (list) – Column label(s) from the STAR file to use for the y data series if None a simple count from 1 will be used
xlabel (str) – Label for the x axis, if no x data are specified the label will ‘Count’, if x data are specified and the xlabel is None the x axis label will be the name of the STAR file column used
xrange (list) – Range for x vlaues to be displayed, full range if None
yrange (list) – Range for y vlaues to be displayed, full range if None
data_series_labels (list) – Names for the data series
ylabel (str) – Label for the y axis, if None the y axis label will be the name of the STAR file column used
assoc_data (list) – List of data file(s) associated with this graph
modes (list) – Controls the appearance of each data series, choose from ‘lines’, ‘markers’ ‘or lines+markers’
start_collapsed (bool) – Should the display start out collapsed when displayed in the GUI
flag (str) – If this display object contains scientificlly dubious results display this message
- Returns:
A ResultsDisplayGraph object for the created graph
- Return type:
- pipeliner.display_tools.histogram_from_starfile_col(title: str, starfile: str, block: str, data_col: str, xlabel: str = '', ylabel: str = 'Count', assoc_data: List[str] | None = None, start_collapsed: bool = False, flag: str = '') ResultsDisplayHistogram | ResultsDisplayPending
Automatically generate a ResultsDisplayHistogram object from a STAR file
- Parameters:
title (str) – The title of the final graph
starfile (str) – Path to the STAR file to use
block (str) – The block to use in the STAR file, use None for a STAR file with only a single block
data_col (str) – Column label from the STAR file to use for the data series
xlabel (str) – Label for the x axis, if no x data are specified the label will ‘Count’, if x data are specified and the xlabel is None the x axis label will be the name of the STAR file column used
ylabel (str) – Label for the y axis, if None the y axis label will be the name of the STAR file column used
assoc_data (list) – List of data file(s) associated with this graph
start_collapsed (bool) – Should the display start out collapsed when displayed in the GUI
flag (str) – If this display object contains scientifically dubious results display this message
- pipeliner.display_tools.make_image_carousel(starfile: str, block: str, column: str, title: str = '', nimg: int = 500, start_collapsed: bool = False, flag: str = '') ResultsDisplayImageCarousel | ResultsDisplayPending
Make a new image carousel display object from a STAR file.
- Parameters:
starfile (str) – The STAR file to use
block (str) – The name of the block with the images. If the STAR file contains only a single block, a blank string can be given here.
column (str) – The name of the column that has the images
title (str) – The title for the object, automatically generated if “”
nimg (int) – Number of images to use in the carousel, default 500, or all if < 0. Beware that using all images could lead to enormous results display files if the data set is large, so you should usually leave this at its default.
start_collapsed (bool) – Should the display start out collapsed when displayed in the GUI
flag (str) – If this display object contains scientificlly dubious results display this message
- Returns:
The DisplayObject for displaying STAR file images in the interactive carousel.
- Return type:
- Raises:
ValueError – If no images are listed in the STAR file.
- pipeliner.display_tools.make_map_model_thumb_and_display(outputdir: str, maps: List[str] | None = None, maps_opacity: List[float] | None = None, maps_colours: List[str] | None = None, models: List[str] | None = None, models_colours: List[str] | None = None, bild_files: List[str] | None = None, title: str | None = None, maps_data: str = '', models_data: str = '', assoc_data: List | None = None, flag: str = '', start_collapsed: bool | None = True) ResultsDisplayMapModel | ResultsDisplayPending
Make a display object for an atomic model overlaid over a map
Makes a binned map and a ResultsDisplayMapModel display object
- Parameters:
outputdir (str) – Name of the job’s output directory
maps (list) – List of map files to use
models (list) – List of model files to use
maps_opacity (list) – List of opacity for the maps, from 0-1 if None 0.5 is used for all
maps_colours (list) – Colors for the maps of specific ones are desired, otherwise mol* will assign them
title (str) – The title for the ResultsDisplayMapModel object, if None the name of the map and model will be used
maps_data (str) – Any additional data to be included about the map
models_data (str) – Any additional data to be included about the map
models_colours (list) – Colors for the models of specific ones are desired, otherwise mol* will assign them
assoc_data (list) – List of associated data, if left as None then just uses the file itself
flag (str) – If the results are considered scientifically dubious explain in this string
start_collapsed (bool) – Should the display start out collapsed when displayed
- Returns:
The DisplayObject for the map and model
- Return type:
- pipeliner.display_tools.make_maps_slice_montage_and_3d_display(in_maps: Dict[str, str], output_dir: str, combine_montages: bool = True, cmap: str = '', base64_output: bool = False, optimiser_info: dict | None = None, bild_files: List[str] | None = None) List[ResultsDisplayObject]
Make a set of display objects for 3D maps
Returns separate 3D viewer display objects for each map and either a combined slices montage or a slices montage for each.
- Parameters:
in_maps (dict) – {input file: label}. If the label is “”, the filename will be used
output_dir (str) – The job’s output dir where the thumbnails dir will be created if necessary
combine_montages (bool) – Should a single montage be made with slices for all maps, otherwise a separate montage is made for each
cmap (str) – what color map to use for the montage, if any.
base64_output (bool) – Whether to create base64 thumbnails and gallery display object.
optimiser_info (dict) – {name: str, type: str} Specify an optimiser data node to be included in the associated nodes of a gallery results display object.
- Returns:
- The display objects montage and then the 3D viewers if combine_montages
is
False, otherwise the montage followed by the 3D viewer for each map in the order they were given.
- Return type:
List
- pipeliner.display_tools.make_mollweide_angular_distribution(starfile_path: str, output_dir: str, title: str = 'Angular distribution of particles', block: str = 'particles', n_phi_bins: int = 72, n_theta_bins: int = 36) ResultsDisplayImage | ResultsDisplayPending
Create a Mollweide projection heatmap of particle orientations
Reads
_rlnAngleRot(phi) and_rlnAngleTilt(theta) from a RELION STAR file and produces a 2-D histogram on a Mollweide projection, saved as a PNG. For C1 symmetry, the plot will show the full distribution of particle orientations. For higher symmetries the plot might occupy a subsection of the full projection, depending on the symmetry axes, e.g. quarter of the plot for D2.- Parameters:
starfile_path (str) – Path to a STAR file containing particle data with
_rlnAngleRotand_rlnAngleTiltcolumnsoutput_dir (str) – Job output directory (PNG is saved under
Thumbnails/)title (str) – Title for the plot and display object
block (str) – Block name in the STAR file
n_phi_bins (int) – Number of bins in the azimuthal direction
n_theta_bins (int) – Number of bins in the polar direction
- Returns:
ResultsDisplayImageorResultsDisplayPendingon error
- pipeliner.display_tools.make_moorhen_display(maps: List[str] | None = None, models: List[str] | None = None, title: str | None = None, maps_data: str = '', models_data: str = '', assoc_data: List | None = None, flag: str = '', session_file: str | None = None) ResultsDisplayMapModel | ResultsDisplayPending
Make a Moorhen display object for maps and/or models
Creates a ResultsDisplayMoorhen display object. If no title is provided, one is generated from the map and model file names.
- Parameters:
maps (list) – List of map files to use
models (list) – List of model files to use
title (str) – The title for the display object. If None, a title is generated from the map and model file names.
maps_data (str) – Any additional data to be included about the maps. If empty, defaults to a comma-separated list of map paths.
models_data (str) – Any additional data to be included about the models. If empty, defaults to a comma-separated list of model paths.
assoc_data (list) – List of associated data. If None, defaults to the combined list of maps and models.
flag (str) – If the results are considered scientifically dubious explain in this string
session_file (str) – Path to a Moorhen session file
- Returns:
The Moorhen DisplayObject for the maps and models, or a
ResultsDisplayPendingif an error occurs- Return type:
- pipeliner.display_tools.make_mrcs_central_slices_montage(in_files: Dict[str, str], output_dir: str, cmap: str = '', base64_output: bool = False, optimiser_info: dict | None = None) ResultsDisplayObject
Make a montage of x,y,z central slices of maps
- Parameters:
in_files (Dict[str, str]) – {file name: label if different from file name}
output_dir (str) – Where to make the Thumbnails dir (if necessary) and put the montage image
cmap (str) – What colormap to use, if any
optimiser_info (dict) – {name: str, type: str} Specify an optimiser data node to be included in the associated nodes of a gallery results display object.
Returns
ResultsDisplayMontage – The montage ResultsDisplayObject
- pipeliner.display_tools.make_particle_coords_thumb(in_mrc, in_coords, out_dir, thumb_size=640, pad=5, start_collapsed=False, title: str = 'Example picked particles', flag: str = '', markers: bool = False) ResultsDisplayImage | ResultsDisplayPending
Create a thumbnail of picked particle coords on their micrograph
Because the extraction box size is not known boxes will be a % of the total image size.
- Parameters:
in_mrc (str) – Path to the merged micrograph mrc file
in_coords (str) – Path to the .star coordinates file
out_dir (str) – Name of the output directory
thumb_size (int) – Size of the x dimension of the final thumbnail image
pad (int) – Thickness of the particle box borders before binning in px
start_collapsed (bool) – Should the display start out collapsed when displayed in the GUI
title (str) – What title to use for the displayobj created
flag (str) – If this display object contains scientificlly dubious results display this message
markers (bool) – Instead of making boxes make markers
- pipeliner.display_tools.mini_montage_from_many_files(filelist: List[str], outputdir: str, nimg: int = 5, montagesize: int = 640, title: str = '', ncols: int = 5, associated_data: List[str] | None = None, labels: List[str] | None = None, cmap: str = '', start_collapsed: bool = False, flag: str = '') ResultsDisplayMontage | ResultsDisplayPending
Make a mini montage from a list of images
Merge and flatten image stacks
- Parameters:
filelist (list) – A list of the files to use
outputdir (str) – The output dir of the pipeliner job
nimg (int) – Number of images to use in the montage
montagesize (int) – Desired size of the final montage image
title (str) – Title for the ResultsDisplay object that will be output
ncols (int) – Number of columns to make in the montage
associated_data (list) – Data files associated with these images, if None then all of the selected images
labels (list) – The labels for the items in the montage
cmap (str) – colormap to apply, if any
start_collapsed (bool) – Should the display start out collapsed when displayed in the GUI
flag (str) – If this display object contains scientificlly dubious results display this message
- Returns:
The DisplayObject for the map
- Return type:
- Raises:
ValueError – If a non mrc or tiff image is used
- pipeliner.display_tools.mini_montage_from_stack(stack_file: str, outputdir: str, nimg: int = 40, ncols: int = 10, montagesize: int = 640, title: str = '', labels: List[str | int] | None = None, cmap: str = '', start_collapsed: bool = False, flag: str = '') ResultsDisplayMontage | ResultsDisplayPending
Make a montage from a mrcs or tiff file
- Parameters:
stack_file (str) – The path to the stack_file
outputdir (str) – The output dir of the pipeliner job
nimg (int) – Number of images to use in the montage, if < 1 uses all of them
ncols (int) – Number of columns to use
montagesize (int) – Desired size of the final montage image
title (str) – Title for the ResultsDisplay object that will be output
labels (list) – Labels for the images
cmap (str) – colormap to apply, if any
start_collapsed (bool) – Should the display start out collapsed when displayed in the GUI
flag (str) – If this display object contains scientificlly dubious results display this message
- Returns:
The DisplayObject for the map
- Return type:
- Raises:
ValueError – If a non mrc or tiff image is used
- pipeliner.display_tools.mini_montage_from_starfile(starfile: str, block: str, column: str, outputdir: str, title: str = '', nimg: int = 20, montagesize: int = 640, ncols: int = 10, labels: List[str] | None = None, cmap: str = '', start_collapsed: bool = False, flag: str = '') ResultsDisplayMontage | ResultsDisplayPending
Make a montage from a list of images in a STAR file column
Merge and flatten image stacks if they are encountered.
- Parameters:
starfile (str) – The STAR file to use
block (str) – The name of the block with the images
column (str) – The name of the column that has the images
outputdir (str) – The output dir of the pipeliner job
title (str) – The title for the object, automatically generated if “”
nimg (int) – Number of images to use in the montage, uses all if < 0
montagesize (int) – Desired size of the final montage image
ncols (int) – number of columns to use
labels (list) – Labels for the images in the montage, in order
cmap (str) – colormap to apply, if any
start_collapsed (bool) – Should the display start out collapsed when displayed in the GUI
flag (str) – If this display object contains scientificlly dubious results display this message
- Returns:
The DisplayObject for the map
- Return type:
- Raises:
ValueError – If a non mrc or tiff image is encountered
ResultsDisplay Objects
These objects generally should not be instantiated directly they should instead be created using the functions above.
- class pipeliner.results_display_objects.ResultsDisplayGallery(*, title: str, images: str, labels: List[str] | None = None, associated_nodes: List[Dict[str, str]], associated_data: List[str], start_collapsed: bool = False, flag: str = '')
Bases:
ResultsDisplayObjectDisplay object for Doppio’s interactive image gallery
It is best to not instantiate this class directly. Instead create it using create_results_display_object
- images
Path to a .json file containing a list of image objects (base64 images and class IDs)
- Type:
- associated_nodes
{name: str, type: str} A list of nodes associated with this gallery along with their full node types
- class pipeliner.results_display_objects.ResultsDisplayGraph(*, xvalues: List[List[float | int]], yvalues: List[List[float | int]], title: str, associated_data: List[str], data_series_labels: List[str], xaxis_label: str = '', xrange: List[float] | None = None, yaxis_label: str = '', yrange: List[float] | None = None, modes: List[Literal['lines', 'markers', 'lines+markers']] | None = None, start_collapsed: bool = False, flag: str = '')
Bases:
ResultsDisplayObjectA simple graph for the GUI to display
It is best to not instantiate this class directly. Instead create it using create_results_display_object
- xvalues
(list): list of x coordinate data series, can have multiple data series
- xrange
Range of x to be displayed, displays the full range if None. If the x axis needs to be reveresd then enter the values backwards [max, min]
- Type:
- yvalues
(list): List y coordinate data series can have multiple data series
- yrange
Range of y to be displayed, displays the full range if None. If the y axis needs to be reveresd then enter the values backwards [max, min]
- Type:
- modes
Controls the appearance of each data series, choose from ‘lines’, ‘markers’ ‘or lines+markers’
- Type:
- class pipeliner.results_display_objects.ResultsDisplayHistogram(*, title: str, associated_data: List[str], data_to_bin: List[float] | None = None, xlabel: str = '', ylabel: str = '', bins: List[int] | None = None, bin_edges: List[float] | None = None, start_collapsed: bool = False, flag: str = '')
Bases:
ResultsDisplayObjectA class for the GUI to display a histogram
It is best to not instantiate this class directly. Instead, create it using create_results_display_object
- Parameters:
title (str) – The title of the histogram
data_to_bin (list) – The data to bin
xlabel (str) – Label for the x axis
ylabel (str) – Label for the y axis
associated_data (list) – List of data files associated with the histogram
bins (list) – A list of bin counts, if they are known
bin_edges (list) – A list of the bin edges, if they are already known
start_collapsed (bool) – Should the object start out collapsed when displayed in the GUI
- Raises:
ValueError – If no data or bins are specified
ValueError – If an attempt is made to specify bins or bin edges when data to bin are being provided
ValueError – If the associated data is not a list, or not provided
- class pipeliner.results_display_objects.ResultsDisplayHtml(*, title: str, associated_data: List[str], html_dir: str = '', html_file: str = '', html_str: str = '', start_collapsed: bool = False, flag: str = '')
Bases:
ResultsDisplayObjectAn object for the GUI to display html
It is best to not instantiate this class directly. Instead create it using create_results_display_object
This can be used for general HTML display in Doppio. Either provide a directory with index.html or specify a html file or provide a html string as input.
- class pipeliner.results_display_objects.ResultsDisplayImage(*, title: str, image_path: str, image_desc: str, associated_data: List[str], start_collapsed: bool = False, flag: str = '')
Bases:
ResultsDisplayObjectA class for the GUI to display a single image
It is best to not instantiate this class directly. Instead create it using create_results_display_object
- class pipeliner.results_display_objects.ResultsDisplayImageCarousel(*, title: str, image_names: List[str], image_index_file: str, associated_data: List[str] | None = None, start_collapsed: bool = False, flag: str = '')
Bases:
ResultsDisplayObjectA class for Doppio’s interactive image carousel
It is best to not instantiate this class directly. Instead create it using create_results_display_object or display_tools.make_image_carousel
- class pipeliner.results_display_objects.ResultsDisplayJson(*, file_path: str, title: str = '', start_collapsed: bool = False, flag: str = '')
Bases:
ResultsDisplayObjectAn object for the GUI to display JSON files
It is best to not instantiate this class directly. Instead create it using create_results_display_object
- class pipeliner.results_display_objects.ResultsDisplayMapModel(title: str, associated_data: List[str], maps: List[str] | None = None, models: List[str] | None = None, maps_data: str = '', models_data: str = '', maps_opacity: List[float] | None = None, maps_colours: List[str] | None = None, models_colours: List[str] | None = None, start_collapsed: bool = True, bild_files: List[str] | None = None, flag: str = '')
Bases:
ResultsDisplayObjectAn object for overlaying multiple maps and/or models
It is best to not instantiate this class directly. Instead create it using create_results_display_object
- maps_colours
Hex values for colouring the maps specific colours, in the form “#XXXXXX” where X is a hex digit (0-9 or a-f). If None, the standard colours will be used
- Type:
- models_colours
Hex values for colouring the models specific colours, in the form “#XXXXXX” where X is a hex digit (0-9 or a-f). If None, the standard colours will be used
- Type:
- bild_files
Optional list of .bild file paths to overlay in the 3D viewer (e.g. angular distribution plots)
- Type:
- Raises:
ValueError – If no maps or models were specified
ValueError – If the map is not .mrc format
ValueError – If models are not in pdb of mmcif format
ValueError – If the number of maps and map opacities don’t match
- class pipeliner.results_display_objects.ResultsDisplayMontage(*, xvalues: List[int], yvalues: List[int], img: str, title: str, associated_data: List[str], labels: List[str] | None = None, start_collapsed: bool = False, flag: str = '')
Bases:
ResultsDisplayObjectAn object to send to the GUI to make an image montage
This one is an image montage with info about the specific images It is best to not instantiate this class directly. Instead create it using create_results_display_object
- xvalues
(list): The x coordinates by image
- yvalues
(list): The y coordinates by image
- class pipeliner.results_display_objects.ResultsDisplayMoorhen(title: str, associated_data: List[str], maps: List[str] | None = None, models: List[str] | None = None, session_file: str | None = None, maps_data: str = '', models_data: str = '', start_collapsed: bool = True, flag: str = '')
Bases:
ResultsDisplayObjectAn object for displaying Maps, Models, and Moorhen job sessions.
It is best to not instantiate this class directly. Instead create it using create_results_display_object
- session_file
Path to a Moorhen session file. This contains the state of the Moorhen session, if provided the expected maps and models should match that of the session file. If not provided, a warning is appended to the flag.
- Type:
- Raises:
ValueError – If a map is not in .mrc, .map, .ccp4 or .mtz format
ValueError – If a model is not in .pdb, .cif, .mmcif, .pdbx or .ent format
- class pipeliner.results_display_objects.ResultsDisplayObject(title: str, start_collapsed: bool = False, flag='')
Bases:
objectAbstract super-class for results display objects
- flag
A message that is displayed if the results display object is showing somthing scientifically dubious.
- Type:
- write_displayobj_file(outdir) None
Write a json file from a ResultsDisplayObject object
- Parameters:
outdir (str) – The directory to write the output in
- Raises:
NotImplementedError – If a write attempt is made from the superclass
- class pipeliner.results_display_objects.ResultsDisplayPdfFile(*, file_path: str, title: str = '', start_collapsed: bool = False, flag: str = '')
Bases:
ResultsDisplayObjectAn object for the GUI to display pdf files
It is best to not instantiate this class directly. Instead create it using create_results_display_object
- class pipeliner.results_display_objects.ResultsDisplayPending(*, title: str = 'Results pending...', message: str = 'The result not available yet', reason: str = 'unknown', start_collapsed: bool = False, flag: str = '')
Bases:
ResultsDisplayObjectA placeholder class for when a job is not able to produce results yet
- class pipeliner.results_display_objects.ResultsDisplayPlotlyFigure(title: str, plotlyfig: str, associated_data: List[str] | None = None, start_collapsed: bool = False)
Bases:
ResultsDisplayObjectThis class displays an existing Plotly Figure object.
Call fig.to_json() on your Figure and then pass the JSON string to the
plotlyfigargument when creating this object.- write_displayobj_file(outdir)
Write a json file from a ResultsDisplayObject object
- Parameters:
outdir (str) – The directory to write the output in
- Raises:
NotImplementedError – If a write attempt is made from the superclass
- class pipeliner.results_display_objects.ResultsDisplayPlotlyHistogram(*, data: List[float] | DataFrame | ndarray | dict | None = None, title: str, x: str | list | None = None, y: str | list | None = None, color: str | int | list | None = None, nbins: int | None = None, range_x: list | None = None, range_y: list | None = None, category_orders: dict | None = None, labels: dict | None = None, bin_counts: List[float] | None = None, bin_centres: List[float] | None = None, associated_data: List[str], start_collapsed: bool = False, flag: str = '', **kwargs)
Bases:
ResultsDisplayObjectA class that generates plotly.graph_objects.Figure object to display a histogram Uses plotly express histogram https://plotly.com/python-api-reference/generated/plotly.express.histogram.html Examples here: https://plotly.com/python/histograms/
- data
The data to bin. Following types are allowed list - list of values to be binned array and dict - converted to a pandas dataframe internally pandas dataframe - ensure column names are added if ‘x’ indicates a column name More details https://plotly.com/python/px-arguments/
- plotlyfig
plotly.graph_objects.Figure object generated from input data
- class pipeliner.results_display_objects.ResultsDisplayPlotlyObj(*, data: list | DataFrame | ndarray | dict, plot_type: list | str, title: str, associated_data: List[str], multi_series: bool = False, subplot: bool = False, make_subplot_args: dict | None = None, subplot_order: str | List[tuple] | None = None, subplot_size: Sequence[int] | None = None, subplot_args: List[dict] | None = None, series_args: List[dict] | None = None, layout_args: dict | None = None, trace_args: dict | None = None, xaxes_args: List[dict] | dict | None = None, yaxes_args: List[dict] | dict | None = None, start_collapsed: bool = False, flag: str = '', **kwargs)
Bases:
ResultsDisplayObjectThis uses the plotly express class to create plotly.graph_objects.Figure object https://plotly.com/python/plotly-express/ Use this class to generate plotly Figure objects for custom plots including facet-plots: https://plotly.com/python/facet-plots/ subplots: https://plotly.com/python/subplots/ multi_series: e.g. https://plotly.com/python/creating-and-updating-figures/#adding-traces
- data
The data to plot. For a single plot, following types are allowed: list - list of values to be binned array and dict - converted to a pandas dataframe internally pandas dataframe - ensure column names are added if ‘x’ indicates a column name More details https://plotly.com/python/px-arguments/ For subplots and/or multi_series: list - list with dictionary of arguments for each plot/series
- plot_type
Required, type of plot. For a single plot, it is the plotly express function to call https://plotly.com/python-api-reference/plotly.express.html For subplots and/or multi_series, plotly.graph_objects function to call https://plotly.com/python-api-reference/plotly.graph_objects.html
- Type:
- plotlyfig
plotly.graph_objects.Figure object generated from input data
- check_subplot_arguments(data, subplot_size, subplot_order, plot_type, subplot_args, xaxes_args, yaxes_args) None
- generate_multiseries_plots(plot_type, plot_args) Figure
- generate_subplots(subplot_size, plot_type, subplot_order, plot_args, make_subplot_args) Figure
- class pipeliner.results_display_objects.ResultsDisplayPlotlyScatter(*, data: List[List[float]] | DataFrame | ndarray | dict | None = None, title: str, x: str | List[float] | None = None, y: str | List[float] | None = None, color: str | int | Sequence[str] | None = None, size: str | int | Sequence[str] | None = None, symbol: str | int | Sequence[str] | None = None, hover_name: str | int | Sequence[str] | None = None, range_color: list | None = None, range_x: list | None = None, range_y: list | None = None, category_orders: dict | None = None, labels: dict | None = None, associated_data: List[str], start_collapsed: bool = False, flag: str = '', **kwargs)
Bases:
ResultsDisplayObjectA class that generates plotly.graph_objects.Figure object to display a scatter plot Uses plotly express scatter https://plotly.com/python-api-reference/generated/plotly.express.scatter.html Examples here: https://plotly.com/python/line-and-scatter/
- data
The data to bin. Following types are allowed list - list of values to be binned array and dict - converted to a pandas dataframe internally pandas dataframe - ensure column names are added if ‘x’ indicates a column name More details https://plotly.com/python/px-arguments/
- plotlyfig
plotly.graph_objects.Figure object generated from input data
- class pipeliner.results_display_objects.ResultsDisplayRvapi(*, title: str, rvapi_dir: str, start_collapsed: bool = False, flag: str = '')
Bases:
ResultsDisplayObjectAn object for the GUI to display rvapi objects
It is best to not instantiate this class directly. Instead create it using create_results_display_object
This can be used for general HTML display in Doppio. Create a directory with index.html and it will be shown in the results display tab
- class pipeliner.results_display_objects.ResultsDisplayTable(*, title: str, headers: List[str], table_data: List[List[str]], associated_data: List[str], header_tooltips: List[str] | None = None, start_collapsed: bool = False, flag: str = '')
Bases:
ResultsDisplayObjectAn object for the GUI to display a table
It is best to not instantiate this class directly. Instead create it using create_results_display_object
- class pipeliner.results_display_objects.ResultsDisplayText(*, title: str, display_data: str, associated_data: List[str] | None = None, start_collapsed: bool = False, flag: str = '')
Bases:
ResultsDisplayObjectA class to display general text in the GUI results tab
It is best to not instantiate this class directly. Instead create it using create_results_display_object
- class pipeliner.results_display_objects.ResultsDisplayTextFile(*, file_path: str, title: str = '', start_collapsed: bool = False, flag: str = '')
Bases:
ResultsDisplayObjectAn object for the GUI to display ascii tecxt files
It is best to not instantiate this class directly. Instead create it using create_results_display_object
This can be used for default display of files that have ascii encoded text but the formats are too variable to make a more complex ResultsDisplayFile
Deposition Objects
Lists of DepositionObject are returned by a PipelinerJob.
The prepare_emdb_deposition_data and prepare_empiar_deposition_data functions
are used to prepare automated depositions to the EMDB, and EMPIAR.
- class pipeliner.deposition_tools.deposition_tools.DepositionObject(category: str = '', parent_job: str | None = None, alt_dict: Dict[str, Dict[str, Dict[str, str | Sequence[str]]]] | None = None)
Bases:
objectThis object contains data that will be used in an EMPIAR, EMDB, or PDB deposition
- category
The category of deposition object. This should match the name of the dictionary key for the field in the database schema
- Type:
- alt_dict
The dict that contains the information about the deposition objects for the database being deposited to if the EMDB schema are not being used. See pipeliner.deposition_tools.emdb_cats for the format
- data
Contains the actual deposition data. Keys must exactly match the field names in the database schema
- pipeliner.deposition_tools.deposition_tools.parse_onedep_cif(input_file: str, jobname: str | None = None) List[DepositionObject]
Parse a cif in the onedep pdb/emdb format and return deposition objects
If a depobj cannot be created, skip it and raise a warning
- Parameters:
- Returns:
A dep obj for each entry in the cif file
- Return type:
Sequence[DepositionObject]
Functions that support these methods are:
EMPIAR DepositionObjects
- class pipeliner.deposition_tools.empiar_deposition_objects.Micrograph(file: str, ext: str, n_frames: int, dimx: int, dimy: int, dtype: str, headtype: str, apix: float, voltage: float, spherical_aberration: float)
Bases:
object
- pipeliner.deposition_tools.empiar_deposition_objects.get_imgfile_info(imgfile: str, blockname: str, img_block_col: str) Tuple[Dict[str, Tuple[float, float, float]], List[List[str]]]
Get information from the STAR file containing image info
- Parameters:
- Returns:
( Dict with info about the opitcs groups {og_number: (apix, voltage, sphere. ab)}, List of full paths (relative to the working dir) for all the images in the file, except in the case of movies then the path is relative to import dir)
- Return type:
- pipeliner.deposition_tools.empiar_deposition_objects.prepare_empiar_corrparts(in_file: str, job: str | None = None) List[DepositionObject]
Prepare the particles deposition objects for an empiar deposition
- Parameters:
- Returns:
The DepositionObjects
- Return type:
List[DepositionObject]
- pipeliner.deposition_tools.empiar_deposition_objects.prepare_empiar_mics(in_file: str, job: str | None = None)
- pipeliner.deposition_tools.empiar_deposition_objects.prepare_empiar_mics_parts_data(mpfile: str, is_parts: bool, is_cor_parts: bool, job: str | None = None) List[DepositionObject]
Prepare the micrographs or particles portion of an EMPIAR deposition
- Parameters:
- Returns:
A list of deposition objects
- Return type:
- pipeliner.deposition_tools.empiar_deposition_objects.prepare_empiar_parts(in_file: str, job: str | None = None) List[DepositionObject]
Prepare the particles deposition objects for an empiar deposition
- Parameters:
- Returns:
The DepositionObjects
- Return type:
List[DepositionObject]
- pipeliner.deposition_tools.empiar_deposition_objects.prepare_empiar_raw_mics(movfile: str, job: str | None = None) List[DepositionObject]
Prepare the raw micrographs portion of an EMPIAR deposition
- Parameters:
- Returns:
- A
DepositionObject used to create a deposition
- Return type:
List[DepositionObject]
- pipeliner.deposition_tools.empiar_deposition.get_citation_data(joboptions: Dict[str, str]) List[Dict[str, object]]
Gets the data for an EMPIAR citation
- pipeliner.deposition_tools.empiar_deposition.get_deposition_objects_empiar(terminal_job: str, do_parts: bool = True, do_rparts: bool = True, do_movs: bool = True, do_mics: bool = True) List[DepositionObject]
- pipeliner.deposition_tools.empiar_deposition.merge_empiar_dep_objs(in_depobjs: List[DepositionObject]) List[DepositionObject]
Merges together the list of DepositionObjects for an empiar job
For movies all deposition objects are kept. This is because multiple movie sets may be imported and combined. For corrected mics, particles, and corrected particles only the newest ones (the ones that contributed to the final job) are retained. This is to prevent duplications.
- Parameters:
in_depobjs (List[DepositionObject]) – The depobjs from the chain of jobs
- Returns:
The merged deposition objects
- Return type:
List[DepositionObject]
- pipeliner.deposition_tools.empiar_deposition.parse_empiar_jobstar(jobstar_file: str) Dict[str, object]
Parse a job.star file from an empiar deposition job and format if for deposition
- pipeliner.deposition_tools.empiar_deposition.prepare_empiar_deposition(terminal_job: str, jobstar_file: str | None = None, do_parts: bool = True, do_rparts: bool = True, do_movs: bool = True, do_mics: bool = True) str
Prepare a deposition for empiar
- Returns:
The name of the deposition file
- Return type:
EMDB DepositionObjects
EMDB Deposition objects correspond to the schema here: http://ftp.ebi.ac.uk/pub/databases/emdb/doc/XML-schemas/emdb-schemas/v3/current_v3/doc/Untitled.html
- pipeliner.deposition_tools.emdb_deposition_objects.make_em_software_depobj(the_prog: ExternalProgram, emdb_software_class, details: str = '', jobname: str = '') DepositionObject
Make a deposition object for a specific piece of software
This function generally doesn’t need to be explicitly called because the PipelinerJob will do it for each piece of software during creation of the deposition.
- Parameters:
the_prog (ExternalProgram) – The program to get the depobj for
jobname (str) – The job that the depobj is being created for
details (str) – Details about what the software was doing
- Returns:
The depobj for the program
- Return type:
- pipeliner.deposition_tools.emdb_deposition.clean_dobj_cross_references(dobjs: List[DepositionObject]) List[DepositionObject]
Remove non em_image_processing deposition objects not reference by others
- Args:
depobjs: (List[DepositionObject]): The DepositionObjects to operate on
- Returns:
- Contains the most recent of each em_image_porcessing
depobj and all others
- Return type:
List[DepositionObject]
- pipeliner.deposition_tools.emdb_deposition.get_deposition_objs_emdb(terminal_job: str) List[DepositionObject]
Prepare get the deposition objecs for each job in a work flow
- Parameters:
terminal_job (str) – The job to use
- Returns:
The gathered DepositionObject objects.
- Return type:
List[DepositionObject]
- Raises:
ValueError – If the terminal job is not found.
- pipeliner.deposition_tools.emdb_deposition.get_most_recent_processing_depobjs(depobjs: List[DepositionObject]) List[DepositionObject]
Keep only the latest em_image_processing depobj from each type
The type is defined by the details parameter, which is the same as job description Non em_image_processing depobjs will be dealt with later using cross-reference checking
- Parameters:
depobjs – (List[DepositionObject]): The DepositionObjects to operate on
- Returns:
- Contains the most recent of each em_image_porcessing
depobj and all others
- Return type:
List[DepositionObject]
- pipeliner.deposition_tools.emdb_deposition.prepare_emdb_deposition(terminal_job: str, emdb_id: str = '', outfile: str = '', verbose: bool = False, do_temp_ondep_update: bool = True) str
Prepare an emdb deposition cif file
- go over the image_processing deposition objects, decide which are to be kept
This should be the one from the most recent job for each main processing categpry
- if verbose, don’t throw away anything. This will lead to multiple image_processing
entries for intermediate reconstructions, classifications, and etc…
Throw away any other deposition objects that cross-reference obsolete ones, first doing image_processing, then references, and finally ref authors
- pipeliner.deposition_tools.emdb_deposition.sort_depobjs_by_type(depobjs: List[DepositionObject]) Dict[str, List[DepositionObject]]
Sort a list of deposition objects by type
- Parameters:
depobjs (List[DepositionObject]) – The deposition objects with all the ids updated including the EMDB ID
- Returns:
- A dict of types with a list of deposition
objects for each
- Return type:
Dict[str, List[DepositionObject]]
- pipeliner.deposition_tools.emdb_deposition.temp_EMDB_deposition_object_alterations(depobjs: List[DepositionObject]) List[DepositionObject]
Modify the list of deposition objects for current limitations of the OneDep system
Add a single em_image_processing object, remove the others
- Update all other DepositionObjects to point their image_processing_id to the
new one
Change all specific relion program names to RELION
Change ctffind4 program name to CTFFIND
These changes will be rolled back when the OneDep system is updated to accept multiple em_image_processing_id objects
- Parameters:
depobjs (List[DepositionObject])
- Returns:
The update deposition objects
- Return type:
List[DepositionObject]
- pipeliner.deposition_tools.emdb_deposition.update_and_format_deposition_objects(depobjs: List[DepositionObject]) List[DepositionObject]
Update the .id fields of Deposition objects change UUIDs to int ids
- Parameters:
depobjs (List[DepositionObject]) – The deposition objects from the workflow
- Returns:
The Deposition objects with the IDs updated
- Return type:
List[DepositionObject]
- pipeliner.deposition_tools.emdb_deposition.write_deposition_complete_cif(depobjs: List[DepositionObject], emdb_id: str, outfile: str) None
Write a cif file for the deposition
This function writes a complete cif file that is readable by gemmi, this is not the format needed by OneDep, which is writen by write_deposition_onedep_cif.
- Parameters:
depobjs (List[DepositionObject]) – The deposition objects with all the ids updated including the EMDB ID
emdb_id – The Emdb deposition ID, provided by emdb
outfile (str) – Path to write the output file to
- pipeliner.deposition_tools.emdb_deposition.write_deposition_onedep_cif(depobjs: List[DepositionObject], outfile: str) None
Writes a file in the reduced mmcif format required for deposition using OneDep
This format lacks headers
- Parameters:
depobjs (List[DepositionObject]) – The deposition objects with all the ids updated including the EMDB ID
outfile (str) – Path to write the output file to
PDB deposition objects
PDB depositions are not currently supported