dcnum.feat.feat_background

Feature computation: background image data from image data

Submodules

Classes

`Background`	Base class for background computation
`BackgroundCopy`	Copy the input background data to the output file
`BackgroundRollMed`	Rolling median RT-DC background image computation
`BackgroundSparseMed`	Sparse median background correction with cleansing

Functions

get_available_background_methods()

Return dictionary of background computation methods

Package Contents

class dcnum.feat.feat_background.Background(input_data, output_path, compress=True, num_cpus=None, **kwargs)[source]

Bases: abc.ABC

Base class for background computation

Parameters:

input_data (array-like or pathlib.Path) – The input data can be either a path to an HDF5 file with the “evtens/image” dataset or an array-like object that behaves like an image stack (first axis enumerates events)
output_path (pathlib.Path) – Path to the output file. If input_data is a path, you can set output_path to the same path to write directly to the input file. The data are written in the “events/image_bg” dataset in the output file.
compress (bool) – Whether to compress background data. Set this to False for faster processing.
num_cpus (int) – Number of CPUs to use for median computation. Defaults to dcnum.common.cpu_count().
kwargs – Additional keyword arguments passed to the subclass.

logger

output_path

kwargs: background keyword arguments

num_cpus = None: number of CPUs used

image_proc: fraction of images that have been processed

hdin: dcnum.read.HDF5Data | None = None: HDF5Data instance for input data

h5in: dcnum.common.h5py.File | None = None: input h5py.File

h5out: dcnum.common.h5py.File | None = None: output h5py.File

paths_ref = []: reference paths for logging to the output .rtdc file

image_shape: shape of event images

image_count: number of images in the input data

writer

__enter__()[source]

__exit__(type, value, tb)[source]

static check_user_kwargs()[source]

Abstractmethod:

Implement this to check the kwargs during init

get_ppid()[source]

Return a unique background pipeline identifier

The pipeline identifier is universally applicable and must be backwards-compatible (future versions of dcnum will correctly acknowledge the ID).

The segmenter pipeline ID is defined as:

KEY:KW_BACKGROUND

Where KEY is e.g. “sparsemed” or “rollmed”, and KW_BACKGROUND is a list of keyword arguments for check_user_kwargs, e.g.:

kernel_size=100^batch_size=10000

which may be abbreviated to:

k=100^b=10000

classmethod get_ppid_code()[source]

classmethod get_ppid_from_ppkw(kwargs)[source]: Return the PPID based on given keyword arguments for a subclass

static get_ppkw_from_ppid(bg_ppid)[source]: Return keyword arguments for any subclass from a PPID string

get_progress()[source]: Return progress of background computation, float in [0,1]

process()[source]

Perform the background computation

This irreversibly removes/overrides any “image_bg” and “bg_off” features defined in the output file self.h5out.

abstractmethod process_approach()[source]: The actual background computation approach

dcnum.feat.feat_background.get_available_background_methods()[source]: Return dictionary of background computation methods

class dcnum.feat.feat_background.BackgroundCopy(*args, **kwargs)[source]

Bases: dcnum.feat.feat_background.base.Background

Copy the input background data to the output file

static check_user_kwargs()[source]: Implement this to check the kwargs during init

process()[source]: Copy input data to output dataset

process_approach()[source]: The actual background computation approach

class dcnum.feat.feat_background.BackgroundRollMed(input_data, output_path, kernel_size=100, batch_size=10000, compress=True, num_cpus=None)[source]

Bases: dcnum.feat.feat_background.base.Background

Rolling median RT-DC background image computation

There is one big shared array shared_input that contains the image data for each batch.
User specifies batch size (10000) and kernel size (default is 100)
There is a second shared array shared_output that contains the median values corresponding to the data in shared_input.
Background computation is done by copying the input images from a file into the shared array.
The input array is split into and workers compute the rolling median for each point in shared_input.

Parameters:

input_data (array-like or pathlib.Path) – The input data can be either a path to an HDF5 file with the “evtens/image” dataset or an array-like object that behaves like an image stack (first axis enumerates events)
output_path (pathlib.Path) – Path to the output file. If input_data is a path, you can set output_path to the same path to write directly to the input file. The data are written in the “events/image_bg” dataset in the output file.
kernel_size (int) – Kernel size for median computation. This is the number of events that are used to compute the median for each pixel.
batch_size (int) – Number of events to process at the same time. Increasing this number much more than two orders of magnitude larger than kernel_size will not increase computation speed. Larger values lead to a higher memory consumption.
compress (bool) – Whether to compress background data. Set this to False for faster processing.
num_cpus (int) – Number of CPUs to use for median computation. Defaults to dcnum.common.cpu_count().

kernel_size = 100: kernel size used for median filtering

batch_size = 10000: number of events processed at once

shared_input_raw: mp.RawArray for temporary batch input data

shared_output_raw: mp.RawArray for temporary batch output data

shared_input: numpy array reshaped view on self.shared_input_raw. The first axis enumerating the events

shared_output: numpy array reshaped view on self.shared_output_raw. The first axis enumerating the events

current_batch = 0: current batch index (see self.process and process_next_batch)

worker_counter: counter tracking process of workers

queue: queue for median computation jobs

workers: list of workers (processes)

__enter__()[source]

__exit__(type, value, tb)[source]

static check_user_kwargs(*, kernel_size: int = 100, batch_size: int = 10000)[source]

Check user-defined properties of this class

This method primarily exists so that the CLI knows which keyword arguments can be passed to this class.

Parameters:

kernel_size (int) – Kernel size for median computation. This is the number of events that are used to compute the median for each pixel.
batch_size (int) – Number of events to process at the same time. Increasing this number much more than two orders of magnitude larger than kernel_size will not increase computation speed. Larger values lead to a higher memory consumption.

get_slices_for_batch(batch_index=0)[source]

Returns slices for getting the input and writing to output

The input slice is self.kernel_size longer.

map_iterator()[source]: Iterates over arguments for compute_median_for_slice

process_approach()[source]: Perform median computation on entire input data

process_next_batch()[source]: Process one batch of input data

class dcnum.feat.feat_background.BackgroundSparseMed(input_data, output_path, kernel_size=200, split_time=1.0, thresh_cleansing=0, frac_cleansing=0.8, offset_correction=True, compress=True, num_cpus=None)[source]

Bases: dcnum.feat.feat_background.base.Background

Sparse median background correction with cleansing

In contrast to the rolling median background correction, this algorithm only computes the background image every split_time seconds, but with a larger window (default kernel size is 200 frames instead of 100 frames).

At time stamps every split_time seconds, a background image is computed, resulting in a background series.
Cleansing: The background series is checked for images that contain event data using a lengthy algorithm that is documented in the source code (sorry). In short, this gets rid of background images that contain streaks of RBCs.
Each frame gets the background image closest to it based on time from the background series.

Parameters:

input_data (array-like or pathlib.Path) – The input data can be either a path to an HDF5 file with the “evtens/image” dataset or an array-like object that behaves like an image stack (first axis enumerates events).
output_path (pathlib.Path) – Path to the output file. If input_data is a path, you can set output_path to the same path to write directly to the input file. The data are written in the “basin_events/image_bg” dataset in the output file.
kernel_size (int) – Kernel size for median computation. This is the number of events that are used to compute the median for each pixel.
split_time (float) – Time between background images in the background series
thresh_cleansing (float) – A positive floating point value for scaling the thresholding operation when excluding background images from the series. Larger values mean more background images are excluded. Set to zero to enforce a fixed fraction via frac_cleansing.
frac_cleansing (float) – Fraction between 0 and 1 indicating how many background images must still be present after cleansing (in case the cleansing factor is too large). Set to 1 to disable cleansing altogether.
offset_correction (bool) – The sparse median background correction produces one median image for multiple input frames (BTW this also leads to very efficient data storage with internal HDF5 basins). In case the input frames are subject to frame-by-frame brightness variations (e.g. flickering of the illumination source), it is useful to have an offset value per frame that can then be used in a later step to perform a more accurate background correction. This offset is computed here by taking a 20px wide slice from each frame (where the channel wall is located) and computing the median therein relative to the computed background image. The data are written to the “bg_off” feature in the output file alongside “image_bg”. To obtain the corrected background image, add “image_bg” and “bg_off”. Set this to False if you don’t need the “bg_off” feature.
compress (bool) – Whether to compress background data. Set this to False for faster processing.
num_cpus (int) – Number of CPUs to use for median computation. Defaults to dcnum.common.cpu_count().
versionchanged: (..) – 0.23.5: The background image data are stored as an internal mapped basin to reduce the output file size.

kernel_size = 200: kernel size used for median filtering

split_time = 1.0: time between background images in the background series

thresh_cleansing = 0: cleansing threshold factor

frac_cleansing = 0.8: keep at least this many background images from the series

offset_correction = True: offset/flickering correction

time = None

duration: duration of the measurement

step_times

bg_images: array containing all background images

shared_input_raw: mp.RawArray for temporary batch input data

shared_output_raw: mp.RawArray for the median background image

shared_input: numpy array reshaped view on self.shared_input_raw. The First axis enumerating the events

shared_output: numpy array reshaped view on self.shared_output_raw. The First axis enumerating the events

worker_counter: counter tracking process of workers

queue: queue for median computation jobs

workers: list of workers (processes)

__enter__()[source]

__exit__(type, value, tb)[source]

static check_user_kwargs(*, kernel_size: int = 200, split_time: float = 1.0, thresh_cleansing: float = 0, frac_cleansing: float = 0.8, offset_correction: bool = True)[source]

Initialize user-defined properties of this class

This method primarily exists so that the CLI knows which keyword arguments can be passed to this class.

Parameters:

kernel_size (int) – Kernel size for median computation. This is the number of events that are used to compute the median for each pixel.
split_time (float) – Time between background images in the background series
thresh_cleansing (float) – A positive floating point value for scaling the thresholding operation when excluding background images from the series. Larger values mean more background images are excluded. Set to 0 (default) to enforce a fixed fraction frac_cleansing.
frac_cleansing (float) – Fraction between 0 and 1 indicating how many background images must still be present after cleansing (in case the cleansing factor is too large). Set to 1 to disable cleansing altogether.
offset_correction (bool) – The sparse median background correction produces one median image for multiple input frames (BTW this also leads to very efficient data storage with internal HDF5 basins). In case the input frames are subject to frame-by-frame brightness variations (e.g. flickering of the illumination source), it is useful to have an offset value per frame that can then be used in a later step to perform a more accurate background correction. This offset is computed here by taking a 20px wide slice from each frame (where the channel wall is located) and computing the median therein relative to the computed background image. The data are written to the “bg_off” feature in the output file alongside “image_bg”. To obtain the corrected background image, add “image_bg” and “bg_off”. Set this to False if you don’t need the “bg_off” feature.

process_approach()[source]: Perform median computation on entire input data

process_second(ii: int, second: float | int)[source]