dcnum.feat.feat_background.bg_roll_median

Attributes

ndi

Classes

`BackgroundRollMed`	Rolling median RT-DC background image computation
`WorkerRollMed`	Worker process for median computation

Functions

compute_median_for_slice(shared_input, shared_output, ...)

Compute the rolling median for a slice of a shared array

Module Contents

dcnum.feat.feat_background.bg_roll_median.ndi

class dcnum.feat.feat_background.bg_roll_median.BackgroundRollMed(input_data, output_path, kernel_size=100, batch_size=10000, compress=True, num_cpus=None)[source]

Bases: dcnum.feat.feat_background.base.Background

Rolling median RT-DC background image computation

There is one big shared array shared_input that contains the image data for each batch.
User specifies batch size (10000) and kernel size (default is 100)
There is a second shared array shared_output that contains the median values corresponding to the data in shared_input.
Background computation is done by copying the input images from a file into the shared array.
The input array is split into and workers compute the rolling median for each point in shared_input.

Parameters:

input_data (array-like or pathlib.Path) – The input data can be either a path to an HDF5 file with the “evtens/image” dataset or an array-like object that behaves like an image stack (first axis enumerates events)
output_path (pathlib.Path) – Path to the output file. If input_data is a path, you can set output_path to the same path to write directly to the input file. The data are written in the “events/image_bg” dataset in the output file.
kernel_size (int) – Kernel size for median computation. This is the number of events that are used to compute the median for each pixel.
batch_size (int) – Number of events to process at the same time. Increasing this number much more than two orders of magnitude larger than kernel_size will not increase computation speed. Larger values lead to a higher memory consumption.
compress (bool) – Whether to compress background data. Set this to False for faster processing.
num_cpus (int) – Number of CPUs to use for median computation. Defaults to dcnum.common.cpu_count().

kernel_size = 100: kernel size used for median filtering

batch_size = 10000: number of events processed at once

shared_input_raw: mp.RawArray for temporary batch input data

shared_output_raw: mp.RawArray for temporary batch output data

shared_input: numpy array reshaped view on self.shared_input_raw. The first axis enumerating the events

shared_output: numpy array reshaped view on self.shared_output_raw. The first axis enumerating the events

current_batch = 0: current batch index (see self.process and process_next_batch)

worker_counter: counter tracking process of workers

queue: queue for median computation jobs

workers: list of workers (processes)

__enter__()[source]

__exit__(type, value, tb)[source]

static check_user_kwargs(*, kernel_size: int = 100, batch_size: int = 10000)[source]

Check user-defined properties of this class

This method primarily exists so that the CLI knows which keyword arguments can be passed to this class.

Parameters:

kernel_size (int) – Kernel size for median computation. This is the number of events that are used to compute the median for each pixel.
batch_size (int) – Number of events to process at the same time. Increasing this number much more than two orders of magnitude larger than kernel_size will not increase computation speed. Larger values lead to a higher memory consumption.

get_slices_for_batch(batch_index=0)[source]

Returns slices for getting the input and writing to output

The input slice is self.kernel_size longer.

map_iterator()[source]: Iterates over arguments for compute_median_for_slice

process_approach()[source]: Perform median computation on entire input data

process_next_batch()[source]: Process one batch of input data

class dcnum.feat.feat_background.bg_roll_median.WorkerRollMed(job_queue, counter, shared_input, shared_output, batch_size, kernel_size, *args, **kwargs)[source]

Bases: dcnum.feat.feat_background.base.mp_spawn.Process

Worker process for median computation

queue

counter

shared_input_raw

shared_output_raw

batch_size

kernel_size

run()[source]: Main loop of worker process (breaks when self.counter <0)

start()[source]: Start child process

dcnum.feat.feat_background.bg_roll_median.compute_median_for_slice(shared_input, shared_output, kernel_size, output_size, job_slice)[source]

Compute the rolling median for a slice of a shared array

Parameters:

shared_input (multiprocessing.RawArray) – Input data for which to compute the median. For each pixel in the original image, batch_size + kernel_size events are stored in this array one after another in a row. The total size of this array is batch_size * kernel_size * number_of_pixels_in_the_image.
shared_output (multiprocessing.RawArray) – Used for storing the result. Note that the last kernel_size elements for each pixel in this output array are junk data (because it is a rolling median).
kernel_size (int) – Kernel size for median computation. This is the number of events that are used to compute the median for each pixel.
output_size (int) – The partial batch size, i.e. the number of events for which to compute the rolling median. Note that output_size + kernel_size events are taken from shared_input
job_slice (slice) – Now this is the important part. We can write to shared_input and shared_output from multiple processes. This slice tells us which part of the data we are working on. Only this slice will be edited in shared_output. This slice defines how many pixels we are looking at.