dcnum.read

Submodules

Attributes

PROTECTED_FEATURES

Frame-defined scalar features.

Exceptions

BasinIdentifierMismatchError

Initialize self. See help(type(self)) for accurate signature.

Classes

`HDF5Data`
`HDF5ImageCache`	An HDF5 image cache

Functions

`md5sum`(path[, blocksize, count])	Compute (partial) MD5 sum of a file
`detect_flickering`(image_data[, roi_height, ...])	Determine whether an image series experiences flickering
`get_measurement_identifier`(→ str \| None)	Return the measurement identifier for the given H5File object
`concatenated_hdf5_data`(paths[, path_out, ...])	Return a virtual dataset concatenating all the input paths
`get_mapping_indices`(index_mapping)	Return integer numpy array with mapping indices for a range
`get_mapped_object`(obj[, index_mapping])

Package Contents

dcnum.read.md5sum(path, blocksize=65536, count=0)[source]

Compute (partial) MD5 sum of a file

Parameters:

path (str or pathlib.Path) – path to the file
blocksize (int) – block size in bytes read from the file (set to 0 to hash the entire file)
count (int) – number of blocks read from the file

dcnum.read.PROTECTED_FEATURES = ['bg_off', 'flow_rate', 'frame', 'g_force', 'pressure', 'temp', 'temp_amb', 'time']: Frame-defined scalar features. Scalar features that apply to all events in a frame and which are not computed for individual events

dcnum.read.detect_flickering(image_data: numpy.ndarray | dcnum.read.hdf5_data.HDF5Data, roi_height: int = 10, brightness_threshold: float = 2.5, count_threshold: int = 5, max_frames: int = 1000)[source]

Determine whether an image series experiences flickering

Flickering is an unwelcome phenomenon due to a faulty data acquisition device. For instance, if there is random voltage noise in the electronics managing the LED power, then the brightness of the LED will vary randomly when the noise signal overlaps with the flash triggering signal.

If flickering is detected, you should use the “sparsemed” background computation with offset_correction set to True.

Parameters:

image_data – sliceable object (e.g. numpy array or HDF5Data) containing image data.
roi_height (int) – height of the ROI in pixels for which to search for flickering; the entire width of the image is used
brightness_threshold (float) – brightness difference between individual ROIs median and median of all ROI medians leading to a positive flickering event
count_threshold (int) – minimum number of flickering events that would lead to a positive flickering decision
max_frames (int) – maximum number of frames to include in the flickering analysis

Parameters:

path – path to data file
pixel_size – pixel size in µm
md5_5m – MD5 sum of the first 5 MiB; computed if not provided
meta – metadata dictionary; extracted from HDF5 attributes if not provided
basins – list of basin dictionaries; extracted from HDF5 attributes if not provided
logs – dictionary of logs; extracted from HDF5 attributes if not provided
tables – dictionary of tables; extracted from HDF5 attributes if not provided
image_cache_size – size of the image cache to use when accessing image data
image_chunk_size – maximum number of images in each image cache chunk
index_mapping – select only a subset of input events, transparently reducing the size of the dataset, possible data types are - int N: use the first N events - slice: use the events defined by a slice - list: list of integers specifying the event indices to use Numpy indexing rules apply. E.g. to only process the first 100 events, set this to 100 or slice(0, 100).

__contains__(item)[source]

__enter__()[source]

__exit__(exc_type, exc_val, exc_tb)[source]

__getitem__(feat)[source]

__getstate__()[source]

__setstate__(state)[source]

__len__()[source]

property h5: dcnum.common.h5py.File

property image: dcnum.read.cache.HDF5ImageCache | None

property image_bg: dcnum.read.cache.HDF5ImageCache | None

property image_corr: dcnum.read.cache.ImageCorrCache | None

property image_num_chunks: Number of image chunks given self.image_chunk_size

property mask

property meta_nest

Return self.meta as nested dicitonary

This gets very close to the dclab config property of datasets.

property pixel_size

static extract_basin_dicts(h5, check=True)[source]: Return list of basin dictionaries

property features_scalar_frame

Scalar features that apply to all events in a frame

This is a convenience function for copying scalar features over to new processed datasets. Return a list of all features that describe a frame (e.g. temperature or time).

close()[source]: Close the underlying HDF5 file

get_ppid()[source]

classmethod get_ppid_code()[source]

classmethod get_ppid_from_ppkw(kwargs)[source]

static get_ppid_index_mapping(index_mapping)[source]: Return the pipeline identifier part for index mapping

static get_ppkw_from_ppid(dat_ppid)[source]

get_basin_data(index: int) → tuple[dcnum.common.h5py.Group, list, int | slice | list | numpy.ndarray][source]

Return HDF5Data info for a basin index in self.basins

Parameters:

index (int) – index of the basin from which to get data

Returns:

group (h5py.Group) – HDF5 group containing HDF5 Datasets with the names listed in features
features (list of str) – list of features made available by this basin
index_mapping – a mapping (see __init__) that defines mapping from the basin dataset to the referring dataset

_get_basin_data_file(bn_dict)[source]

_get_basin_data_internal(bn_dict)[source]

get_image_cache(feat)[source]

Create an HDF5ImageCache object for the current dataset

This method also tries to find image data in self.basins.

keys()[source]

class dcnum.read.HDF5ImageCache(h5ds: dcnum.common.h5py.Dataset | dcnum.read.mapped.MappedHDF5Dataset, chunk_size: int = 1000, cache_size: int = 2, boolean: bool = False)[source]

Bases: BaseImageChunkCache

An HDF5 image cache

Deformability cytometry data files commonly contain image stacks that are chunked in various ways. Loading just a single image can be time-consuming, because an entire HDF5 chunk has to be loaded, decompressed and from that one image extracted. The HDF5ImageCache class caches the chunks from the HDF5 files into memory, making single-image-access very fast.

h5ds

boolean = False

_get_chunk_data(chunk_slice)[source]: Implemented in subclass to obtain actual data

dcnum.read.get_measurement_identifier(h5: dcnum.common.h5py.Group) → str | None[source]

Return the measurement identifier for the given H5File object

The basin identifier is taken from the HDF5 attributes. If the “experiment:run identifier” attribute is not set, it is computed from the HDF5 attributes “experiment:time”, “experiment:date”, and “setup:identifier”.

If the measurement identifier cannot be found or computed, return None.

exception dcnum.read.BasinIdentifierMismatchError[source]

Bases: BaseException

Initialize self. See help(type(self)) for accurate signature.

dcnum.read.concatenated_hdf5_data(paths: list[pathlib.Path], path_out: bool | pathlib.Path | None = True, compute_frame: bool = True, features: list[str] | None = None)[source]

Return a virtual dataset concatenating all the input paths

Parameters:

paths – Path of the input HDF5 files that will be concatenated along the feature axis. The metadata will be taken from the first file.
path_out – If None, then the dataset is created in memory. If True (default), create a file on disk. If a pathlib.Path is specified, the dataset is written to that file. Note that datasets in memory are likely not pickable (so don’t use them for multiprocessing).
compute_frame – Whether to compute the “events/frame” feature, taking the frame data from the input files and properly incrementing them along the file index.
features – List of features to take from the input files.

Notes

If one of the input files does not contain a feature from the first input paths, then a ValueError is raised. Use the features argument to specify which features you need instead.
Basins are not considered.

dcnum.read.get_mapping_indices(index_mapping: numbers.Integral | slice | list | numpy.ndarray)[source]

Return integer numpy array with mapping indices for a range

Parameters:: index_mapping (numbers.Integral | slice | list | np.ndarray) – Several options you have here: - integer: results in np.arrange(integer) - slice: results in np.arrange(slice.start, slice.stop, slice.step) - list or np.ndarray: returns the input as unit32 array

dcnum.read.get_mapped_object(obj, index_mapping=None)[source]