General Utilities

These utilities are used by the pipeliner for basic tasks such as nice looking on-screen display, checking file names, and getting directory and file names

class pipeliner.utils.DirectoryBasedLock(dirname: str | os.PathLike[str] = '.relion_lock', timeout=60.0)

Bases: object

A lock based on the creation and existence of a directory on the file system.

The interface is almost the same as Python’s standard multiprocessing.Lock, except for some changes related to timeout behaviour:

  • There is a default timeout of 60 seconds when acquiring the lock (rather than the default None value, with corresponding infinite timeout, that is used by multiprocessing.Lock). This is for compatibility with previous RELION locking timeout behaviour.

  • A timeout for use when entering a context manager can be set when the lock object is created. Note that this value is ignored if the acquire() method is called directly. If there is a timeout waiting to acquire the lock when entering a context manager, a TimeoutError is raised.

The principle of this lock is that directory creation is an atomic operation provided by the file system, even in (most, modern) networked file systems. If several processes try to create the same directory at the same time, only one will succeed and the rest will get an error. Therefore, we can use this as a locking primitive, acquiring the lock if we successfully create the directory and releasing it by deleting the directory afterwards.

The lock directory name can be set if required. For compatibility with RELION, the default directory name is “.relion_lock”.

acquire(block=True, timeout=60.0)

Acquire a lock, blocking or non-blocking.

With the block argument set to True (the default), the method call will block until the lock is in an unlocked state, then set it to locked and return True.

With the block argument set to False, the method call does not block. If the lock is currently in a locked state, return False; otherwise set the lock to a locked state and return True.

When invoked with a positive, floating-point value for timeout, block for at most the number of seconds specified by timeout as long as the lock can not be acquired. The default is 60.0 seconds; note that this is different from the default timeout in multiprocessing.Lock.acquire().

Invocations with a negative value for timeout are equivalent to a timeout of zero. Invocations with a timeout value of None set the timeout period to infinite. The timeout argument has no practical implications if the block argument is set to False and is thus ignored.

Returns:

True if the lock has been acquired or False if the timeout period has elapsed.

:raises Various possible errors from os.mkdir() including: FileNotFoundError or PermissionError.

release()

Release the lock.

This can be called from any thread, not only the thread which has acquired the lock.

When the lock is locked, reset it to unlocked, and return. If any other threads are blocked waiting for the lock to become unlocked, allow exactly one of them to proceed.

When invoked on an unlocked lock, a RuntimeError is raised.

There is no return value.

pipeliner.utils.check_for_illegal_symbols(check_string: str, string_name: str = 'input', exclude: str = '') str | None

Check a text string doesn’t have any of the disallowed symbols.

Illegal symbols are !*?()^/#<>&%{}$.”’ and @.

Parameters:
  • check_string (str) – The string to be checked

  • string_name (str) – The name of the string being checked; for more informative error messages

  • exclude (str) – Any symbols that are normally in the illegal symbols list but should be allowed.

Returns:

An error message if any illegal symbols are present

Return type:

str

pipeliner.utils.clean_job_dirname(dirname: str) str

Makes sure a pipeline job_dir name is valid and in the right format

Parameters:

dirname (str) – The dirname to check

Returns:

The correctly formatted dirname

Return type:

str

Raises:

ValueError – If the dir name connot be formatted correctly

pipeliner.utils.clean_jobname(jobname: str) str

Makes sure job names are in the correct format

Job names must have a trailing slash, cannot begin with a slash, and have no illegal characters

Parameters:

jobname (str) – The job name to be checked

Returns:

The job name, with corrections in necessary

Return type:

str

pipeliner.utils.compare_nested_lists(a_list: list, e_list: list, tolerance: float = 0.0)

Compare two nested lists, allow or a tolerance for float values

Parameters:
  • a_list (list) – the actual list

  • e_list (list) – the expected list

  • tolerance (float) – The tolerance for float values, 0 if they must match exactly

Returns:

do they match within tolerance?

Return type:

bool

pipeliner.utils.convert_relative_filename(filename: str) str

Convert a filename that is relative to the project to just its name

IE: ../../my_dir/my_file.txt -> my_dir/my_file /my_dir/my_file.txt -> my_dir/my_file ~/my_dir/my_file.txt -> my_dir/my_file

Parameters:

filename

Returns:

The part of the file path that is not relative to the project

Return type:

str

pipeliner.utils.count_file_lines(filename: str) int

Fast and efficient count of number of lines in a file

Parameters:

filename (str) – Name of the file to count the lines in

Returns:

Number of lines

Return type:

int

pipeliner.utils.date_time_tag(compact: bool = False) str

Get a current date and time tag

It can return a compact version or one that is easier to read

Parameters:

compact (bool) – Should the returned tag be in the compact form

Returns:

The datetime tag

compact format is: YYYYMMDDHHMMSS

verbose form is: YYYY-MM-DD HH:MM:SS.MS

Return type:

str

pipeliner.utils.decompose_pipeline_filename(fn_in: str) Tuple[str, int, str]

Breaks a job name into usable pieces

Returns everything before the job number, the job number as an int and everything after the job number setup for up to 20 dirs deep. The 20 directory limit is from the relion code but no really necessary anymore

Parameters:

fn_in (str) – The job or file name to be broken down in the format: <jobtype>/jobxxx/<filename>

Returns:

The decomposed file name: (str, int, str)

[0] Everything before ‘job’ in the file name

[1] The job number

[2] Everything after the job number

Return type:

tuple

Raises:

ValueError – If the input file name is more than 20 directories deep

pipeliner.utils.file_in_project(filename: str) bool

Check that a file is part of the project

Not done with os.path.abspath(file).startswith(project_dir) because this causes errors during testing

pipeliner.utils.find_common_string(input_strings: List[str]) str

Find the common part of a list of strings starting from the beginning

Parameters:

input_strings (list) – List of strings to compare

Returns:

The common portion of the strings

Return type:

str

Raises:

ValueError – If input_list is shorter than 2

pipeliner.utils.format_string_to_type_objs(in_str: str) str | int | float | bool | None

Returns Int, Float, Bool, and None Objects from strings

Any number with a decimal point, in scientific notation, or ‘NaN’ will return a float Any other number will retun an int ‘False’ or ‘false’ returns False ‘True’ or ‘true’ returns True ‘None’ returns a NoneType object

Parameters:

in_str (str) – The input string

Returns:

The appropriate object

Return type:

Optional[Union[str, int, float, bool]]

pipeliner.utils.get_file_size_mb(file: str | Path) float

Get the size of a file in MB, rounded to 2 decimal places

Parameters:

file (str) – The file to check

pipeliner.utils.get_job_number(job_name)

Get the job number from a pipeliner job as an int

Parameters:

job_name (str) – The job name in the pipeliner format

Returns:

The job number

Return type:

int

pipeliner.utils.get_job_runner_command() List[str]

Get the full command to run the job_runner.py script.

pipeliner.utils.get_job_script(name: str) str

Get the full path to a job script file.

Returns:

The job script file, if it exists.

Raises:

FileNotFoundError – if the named job script cannot be found.

pipeliner.utils.get_pipeliner_root() Path

Get the directory of the main pipeliner module

Returns:

The path of the pipeliner

Return type:

Path

pipeliner.utils.get_python_command() List[str]

Get the command to launch the current Python interpreter.

Note that the command is returned as a list and might include some arguments as well as the command itself.

pipeliner.utils.get_regenerate_results_command() List[str]

Get the full command to run the regenerate_results.py script.

pipeliner.utils.increment_file_basenames(files: List[str]) Dict[str, str]

Increment the base names of files if there are duplicates

e.g. Import/job001/myfile, Import/job002/myfile

Parameters:

files (List[str]) – The files to operate on

Returns:

The file name and its incremented basename

Return type:

Dict[str, str]

pipeliner.utils.is_uuid4(in_str: str) bool

Check that a string is a UID4

Parameters:

in_str (str) – The string to test

Returns:

Is the string a valid uid4

Return type:

bool

pipeliner.utils.launch_detached_process(command: List[str]) None

Run the given command as a detached process.

The process is started in a new session and with all file handles set to null, to ensure it keeps running in the background after the parent Python process exits.

Parameters:

command (List[str]) – The commands to execute

pipeliner.utils.make_pretty_header(text: str, char: str = '-=', top: bool = True, bottom: bool = True)

Make nice looking headers for on-screen display

Parameters:
  • text (str) – The text to put in the header

  • char (str) – What characters to use for the header

Returns:

A nice looking header

Return type:

str

pipeliner.utils.print_nice_columns(list_in: List[str], err_msg: str = 'ERROR: No items in input list')

Takes a list of items and makes three columns for nicer on-screen display

Parameters:
  • list_in (str) – The list to display in columns

  • err_msg (str) – The message to display if the list is empty

pipeliner.utils.run_subprocess(*args, **kwargs) CompletedProcess
pipeliner.utils.smart_strip_quotes(in_string: str) str

Strip the quotes from a string in an intelligent manner

Remove leading and ending ‘ and “ but don’t remove them internally

Parameters:

in_string (str) – The input string

Returns:

the string with leading and ending quotes removed

Return type:

str

pipeliner.utils.str_is_hex_colour(in_string, allow_0x: bool = False) bool

Test that a string is a hexadecimal colour code

Valid codes consist of a # symbol or ‘0x’ followed by exactly six hexadecimal digits (0-9 or a-f, lower or upper case).

Parameters:
  • in_string (str) – The string to test

  • allow_0x (bool) – Also allow ‘0x’ style codes

Returns:

is it a valid colour code?

Return type:

bool

pipeliner.utils.subprocess_popen(*args, **kwargs) Popen
pipeliner.utils.touch(filename: str)

Create an empty file

Parameters:

filename (str) – The name for the file to create

pipeliner.utils.truncate_number(number: float, maxlength: int) str

Return a number with no more than x decimal places but no trailing 0s

This is used to format numbers in the exact same way that RELION does it. IE: with maxlength 3; 1.2000 = 1.2, 1.0 = 1, 1.23 = 1.23. RELION commands are happy to accept numbers with any number of decimal places or trailing 0s. This function is just to maintain continuity between RELION and pipeliner commands

Parameters:
  • number (float) – The number to be truncated

  • maxlength (int) – The maximum number of decimal places

pipeliner.utils.wrap_text(text_string: str)

Produces <= 55 character wide wrapped text for on-screen display

Parameters:

text_string (str) – The text to be displayed