General Utilities

These utilities are used by the pipeliner for basic tasks such as nice looking on-screen display, checking file names, and getting directory and file names

class pipeliner.utils.DirectoryBasedLock(dirname: str | os.PathLike[str] = '.relion_lock', timeout=60.0)

Bases: object

A lock based on the creation and existence of a directory on the file system.

The interface is almost the same as Python’s standard multiprocessing.Lock, except for some changes related to timeout behaviour:

There is a default timeout of 60 seconds when acquiring the lock (rather than the default None value, with corresponding infinite timeout, that is used by multiprocessing.Lock). This is for compatibility with previous RELION locking timeout behaviour.
A timeout for use when entering a context manager can be set when the lock object is created. Note that this value is ignored if the acquire() method is called directly. If there is a timeout waiting to acquire the lock when entering a context manager, a TimeoutError is raised.

The principle of this lock is that directory creation is an atomic operation provided by the file system, even in (most, modern) networked file systems. If several processes try to create the same directory at the same time, only one will succeed and the rest will get an error. Therefore, we can use this as a locking primitive, acquiring the lock if we successfully create the directory and releasing it by deleting the directory afterwards.

The lock directory name can be set if required. For compatibility with RELION, the default directory name is “.relion_lock”.

acquire(block=True, timeout=60.0)

Acquire a lock, blocking or non-blocking.

With the block argument set to True (the default), the method call will block until the lock is in an unlocked state, then set it to locked and return True.

With the block argument set to False, the method call does not block. If the lock is currently in a locked state, return False; otherwise set the lock to a locked state and return True.

When invoked with a positive, floating-point value for timeout, block for at most the number of seconds specified by timeout as long as the lock can not be acquired. The default is 60.0 seconds; note that this is different from the default timeout in multiprocessing.Lock.acquire().

Invocations with a negative value for timeout are equivalent to a timeout of zero. Invocations with a timeout value of None set the timeout period to infinite. The timeout argument has no practical implications if the block argument is set to False and is thus ignored.

Returns:: True if the lock has been acquired or False if the timeout period has elapsed.

:raises Various possible errors from os.mkdir() including: FileNotFoundError or PermissionError.

release()

Release the lock.

This can be called from any thread, not only the thread which has acquired the lock.

When the lock is locked, reset it to unlocked, and return. If any other threads are blocked waiting for the lock to become unlocked, allow exactly one of them to proceed.

When invoked on an unlocked lock, a RuntimeError is raised.

There is no return value.

pipeliner.utils.check_for_illegal_symbols(check_string: str, string_name: str = 'input', exclude: str = '') → str | None

Check a text string doesn’t have any of the disallowed symbols.

Illegal symbols are !*?()^/#<>&%{}$.”’ and @.

Parameters:

check_string (str) – The string to be checked
string_name (str) – The name of the string being checked; for more informative error messages
exclude (str) – Any symbols that are normally in the illegal symbols list but should be allowed.

Returns:

An error message if any illegal symbols are present

Return type:

str

pipeliner.utils.clean_job_dirname(dirname: str) → str

Makes sure a pipeline job_dir name is valid and in the right format

Parameters:: dirname (str) – The dirname to check
Returns:: The correctly formatted dirname
Return type:: str
Raises:: ValueError – If the dir name connot be formatted correctly

pipeliner.utils.clean_jobname(jobname: str) → str

Makes sure job names are in the correct format

Job names must have a trailing slash, cannot begin with a slash, and have no illegal characters

Parameters:: jobname (str) – The job name to be checked
Returns:: The job name, with corrections in necessary
Return type:: str

pipeliner.utils.compare_nested_lists(a_list: list, e_list: list, tolerance: float = 0.0)

Compare two nested lists, allow or a tolerance for float values

Parameters:

a_list (list) – the actual list
e_list (list) – the expected list
tolerance (float) – The tolerance for float values, 0 if they must match exactly

Returns:

do they match within tolerance?

Return type:

bool

pipeliner.utils.convert_relative_filename(filename: str) → str

Convert a filename that is relative to the project to just its name

IE: ../../my_dir/my_file.txt -> my_dir/my_file /my_dir/my_file.txt -> my_dir/my_file ~/my_dir/my_file.txt -> my_dir/my_file

Parameters:: filename –
Returns:: The part of the file path that is not relative to the project
Return type:: str

pipeliner.utils.count_file_lines(filename: str) → int

Fast and efficient count of number of lines in a file

Parameters:: filename (str) – Name of the file to count the lines in
Returns:: Number of lines
Return type:: int

pipeliner.utils.date_time_tag(compact: bool = False) → str

Get a current date and time tag

It can return a compact version or one that is easier to read

Parameters:

compact (bool) – Should the returned tag be in the compact form

Returns:

The datetime tag

compact format is: YYYYMMDDHHMMSS

verbose form is: YYYY-MM-DD HH:MM:SS.MS

Return type:

str

pipeliner.utils.decompose_pipeline_filename(fn_in: str) → Tuple[str, int, str]

Breaks a job name into usable pieces

Returns everything before the job number, the job number as an int and everything after the job number setup for up to 20 dirs deep. The 20 directory limit is from the relion code but no really necessary anymore

Parameters:

fn_in (str) – The job or file name to be broken down in the format: <jobtype>/jobxxx/<filename>

Returns:

The decomposed file name: (str, int, str)

[0] Everything before ‘job’ in the file name

[1] The job number

[2] Everything after the job number

Return type:

tuple

Raises:

ValueError – If the input file name is more than 20 directories deep

pipeliner.utils.file_in_project(filename: str) → bool

Check that a file is part of the project

Not done with os.path.abspath(file).startswith(project_dir) because this causes errors during testing

pipeliner.utils.find_common_string(input_strings: List[str]) → str

Find the common part of a list of strings starting from the beginning

Parameters:: input_strings (list) – List of strings to compare
Returns:: The common portion of the strings
Return type:: str
Raises:: ValueError – If input_list is shorter than 2

pipeliner.utils.format_string_to_type_objs(in_str: str) → str | int | float | bool | None

Returns Int, Float, Bool, and None Objects from strings

Any number with a decimal point, in scientific notation, or ‘NaN’ will return a float Any other number will retun an int ‘False’ or ‘false’ returns False ‘True’ or ‘true’ returns True ‘None’ returns a NoneType object

Parameters:: in_str (str) – The input string
Returns:: The appropriate object
Return type:: Optional[Union[str, int, float, bool]]

pipeliner.utils.get_file_size_mb(file: str | Path) → float

Get the size of a file in MB, rounded to 2 decimal places

Parameters:: file (str) – The file to check

pipeliner.utils.get_job_number(job_name)

Get the job number from a pipeliner job as an int

Parameters:: job_name (str) – The job name in the pipeliner format
Returns:: The job number
Return type:: int

pipeliner.utils.get_job_runner_command() → List[str]: Get the full command to run the job_runner.py script.

pipeliner.utils.get_job_script(name: str) → str

Get the full path to a job script file.

Returns:: The job script file, if it exists.
Raises:: FileNotFoundError – if the named job script cannot be found.

pipeliner.utils.get_pipeliner_root() → Path

Get the directory of the main pipeliner module

Returns:: The path of the pipeliner
Return type:: Path

pipeliner.utils.get_python_command() → List[str]

Get the command to launch the current Python interpreter.

Note that the command is returned as a list and might include some arguments as well as the command itself.

pipeliner.utils.get_regenerate_results_command() → List[str]: Get the full command to run the regenerate_results.py script.

pipeliner.utils.increment_file_basenames(files: List[str]) → Dict[str, str]

Increment the base names of files if there are duplicates

e.g. Import/job001/myfile, Import/job002/myfile

Parameters:: files (List[str]) – The files to operate on
Returns:: The file name and its incremented basename
Return type:: Dict[str, str]

pipeliner.utils.is_uuid4(in_str: str) → bool

Check that a string is a UID4

Parameters:: in_str (str) – The string to test
Returns:: Is the string a valid uid4
Return type:: bool

pipeliner.utils.launch_detached_process(command: List[str]) → None

Run the given command as a detached process.

The process is started in a new session and with all file handles set to null, to ensure it keeps running in the background after the parent Python process exits.

Parameters:: command (List[str]) – The commands to execute

pipeliner.utils.make_pretty_header(text: str, char: str = '-=', top: bool = True, bottom: bool = True)

Make nice looking headers for on-screen display

Parameters:

text (str) – The text to put in the header
char (str) – What characters to use for the header

Returns:

A nice looking header

Return type:

str

pipeliner.utils.print_nice_columns(list_in: List[str], err_msg: str = 'ERROR: No items in input list')

Takes a list of items and makes three columns for nicer on-screen display

Parameters:

list_in (str) – The list to display in columns
err_msg (str) – The message to display if the list is empty

pipeliner.utils.run_subprocess(*args, **kwargs) → CompletedProcess

pipeliner.utils.smart_strip_quotes(in_string: str) → str

Strip the quotes from a string in an intelligent manner

Remove leading and ending ‘ and “ but don’t remove them internally

Parameters:: in_string (str) – The input string
Returns:: the string with leading and ending quotes removed
Return type:: str

pipeliner.utils.str_is_hex_colour(in_string, allow_0x: bool = False) → bool

Test that a string is a hexadecimal colour code

Valid codes consist of a # symbol or ‘0x’ followed by exactly six hexadecimal digits (0-9 or a-f, lower or upper case).

Parameters:

in_string (str) – The string to test
allow_0x (bool) – Also allow ‘0x’ style codes

Returns:

is it a valid colour code?

Return type:

bool

pipeliner.utils.subprocess_popen(*args, **kwargs) → Popen

pipeliner.utils.touch(filename: str)

Create an empty file

Parameters:: filename (str) – The name for the file to create

pipeliner.utils.truncate_number(number: float, maxlength: int) → str

Return a number with no more than x decimal places but no trailing 0s

This is used to format numbers in the exact same way that RELION does it. IE: with maxlength 3; 1.2000 = 1.2, 1.0 = 1, 1.23 = 1.23. RELION commands are happy to accept numbers with any number of decimal places or trailing 0s. This function is just to maintain continuity between RELION and pipeliner commands

Parameters:

number (float) – The number to be truncated
maxlength (int) – The maximum number of decimal places

pipeliner.utils.wrap_text(text_string: str)

Produces <= 55 character wide wrapped text for on-screen display

Parameters:: text_string (str) – The text to be displayed