dcnum.read

Submodules

Attributes

PROTECTED_FEATURES

Frame-defined scalar features.

Exceptions

BasinIdentifierMismatchError

Initialize self. See help(type(self)) for accurate signature.

Classes

HDF5Data

HDF5ImageCache

An HDF5 image cache

Functions

md5sum(path[, blocksize, count])

Compute (partial) MD5 sum of a file

detect_flickering(image_data[, roi_height, ...])

Determine whether an image series experiences flickering

get_measurement_identifier(→ str | None)

Return the measurement identifier for the given H5File object

concatenated_hdf5_data(paths[, path_out, ...])

Return a virtual dataset concatenating all the input paths

get_mapping_indices(index_mapping)

Return integer numpy array with mapping indices for a range

get_mapped_object(obj[, index_mapping])

Package Contents

dcnum.read.md5sum(path, blocksize=65536, count=0)[source]

Compute (partial) MD5 sum of a file

Parameters:
  • path (str or pathlib.Path) – path to the file

  • blocksize (int) – block size in bytes read from the file (set to 0 to hash the entire file)

  • count (int) – number of blocks read from the file

dcnum.read.PROTECTED_FEATURES = ['bg_off', 'flow_rate', 'frame', 'g_force', 'pressure', 'temp', 'temp_amb', 'time']

Frame-defined scalar features. Scalar features that apply to all events in a frame and which are not computed for individual events

dcnum.read.detect_flickering(image_data: numpy.ndarray | dcnum.read.hdf5_data.HDF5Data, roi_height: int = 10, brightness_threshold: float = 2.5, count_threshold: int = 5, max_frames: int = 1000)[source]

Determine whether an image series experiences flickering

Flickering is an unwelcome phenomenon due to a faulty data acquisition device. For instance, if there is random voltage noise in the electronics managing the LED power, then the brightness of the LED will vary randomly when the noise signal overlaps with the flash triggering signal.

If flickering is detected, you should use the “sparsemed” background computation with offset_correction set to True.

Parameters:
  • image_data – sliceable object (e.g. numpy array or HDF5Data) containing image data.

  • roi_height (int) – height of the ROI in pixels for which to search for flickering; the entire width of the image is used

  • brightness_threshold (float) – brightness difference between individual ROIs median and median of all ROI medians leading to a positive flickering event

  • count_threshold (int) – minimum number of flickering events that would lead to a positive flickering decision

  • max_frames (int) – maximum number of frames to include in the flickering analysis

class dcnum.read.HDF5Data(path: pathlib.Path | dcnum.common.h5py.File | BinaryIO, pixel_size: float | None = None, md5_5m: str | None = None, meta: dict | None = None, basins: list[dict[str, list[str] | str]] | None = None, logs: dict[str, list[str]] | None = None, tables: dict[str, numpy.ndarray] | None = None, image_cache_size: int = 2, image_chunk_size: int = 1000, index_mapping: int | slice | list | numpy.ndarray | None = None)[source]
Parameters:
  • path – path to data file

  • pixel_size – pixel size in µm

  • md5_5m – MD5 sum of the first 5 MiB; computed if not provided

  • meta – metadata dictionary; extracted from HDF5 attributes if not provided

  • basins – list of basin dictionaries; extracted from HDF5 attributes if not provided

  • logs – dictionary of logs; extracted from HDF5 attributes if not provided

  • tables – dictionary of tables; extracted from HDF5 attributes if not provided

  • image_cache_size – size of the image cache to use when accessing image data

  • image_chunk_size – maximum number of images in each image cache chunk

  • index_mapping – select only a subset of input events, transparently reducing the size of the dataset, possible data types are - int N: use the first N events - slice: use the events defined by a slice - list: list of integers specifying the event indices to use Numpy indexing rules apply. E.g. to only process the first 100 events, set this to 100 or slice(0, 100).

__contains__(item)[source]
__enter__()[source]
__exit__(exc_type, exc_val, exc_tb)[source]
__getitem__(feat)[source]
__getstate__()[source]
__setstate__(state)[source]
__len__()[source]
property h5
property image: dcnum.read.cache.HDF5ImageCache | None
property image_bg: dcnum.read.cache.HDF5ImageCache | None
property image_corr: dcnum.read.cache.ImageCorrCache | None
property image_num_chunks

Number of image chunks given self.image_chunk_size

property mask
property meta_nest

Return self.meta as nested dicitonary

This gets very close to the dclab config property of datasets.

property pixel_size
static extract_basin_dicts(h5, check=True)[source]

Return list of basin dictionaries

property features_scalar_frame

Scalar features that apply to all events in a frame

This is a convenience function for copying scalar features over to new processed datasets. Return a list of all features that describe a frame (e.g. temperature or time).

close()[source]

Close the underlying HDF5 file

get_ppid()[source]
classmethod get_ppid_code()[source]
classmethod get_ppid_from_ppkw(kwargs)[source]
static get_ppid_index_mapping(index_mapping)[source]

Return the pipeline identifier part for index mapping

static get_ppkw_from_ppid(dat_ppid)[source]
get_basin_data(index: int) tuple[dcnum.common.h5py.Group, list, int | slice | list | numpy.ndarray][source]

Return HDF5Data info for a basin index in self.basins

Parameters:

index (int) – index of the basin from which to get data

Returns:

  • group (h5py.Group) – HDF5 group containing HDF5 Datasets with the names listed in features

  • features (list of str) – list of features made available by this basin

  • index_mapping – a mapping (see __init__) that defines mapping from the basin dataset to the referring dataset

_get_basin_data_file(bn_dict)[source]
_get_basin_data_internal(bn_dict)[source]
get_image_cache(feat)[source]

Create an HDF5ImageCache object for the current dataset

This method also tries to find image data in self.basins.

keys()[source]
class dcnum.read.HDF5ImageCache(h5ds: dcnum.common.h5py.Dataset | dcnum.read.mapped.MappedHDF5Dataset, chunk_size: int = 1000, cache_size: int = 2, boolean: bool = False)[source]

Bases: BaseImageChunkCache

An HDF5 image cache

Deformability cytometry data files commonly contain image stacks that are chunked in various ways. Loading just a single image can be time-consuming, because an entire HDF5 chunk has to be loaded, decompressed and from that one image extracted. The HDF5ImageCache class caches the chunks from the HDF5 files into memory, making single-image-access very fast.

h5ds
boolean = False
_get_chunk_data(chunk_slice)[source]

Implemented in subclass to obtain actual data

dcnum.read.get_measurement_identifier(h5: dcnum.common.h5py.Group) str | None[source]

Return the measurement identifier for the given H5File object

The basin identifier is taken from the HDF5 attributes. If the “experiment:run identifier” attribute is not set, it is computed from the HDF5 attributes “experiment:time”, “experiment:date”, and “setup:identifier”.

If the measurement identifier cannot be found or computed, return None.

exception dcnum.read.BasinIdentifierMismatchError[source]

Bases: BaseException

Initialize self. See help(type(self)) for accurate signature.

dcnum.read.concatenated_hdf5_data(paths: list[pathlib.Path], path_out: bool | pathlib.Path | None = True, compute_frame: bool = True, features: list[str] | None = None)[source]

Return a virtual dataset concatenating all the input paths

Parameters:
  • paths – Path of the input HDF5 files that will be concatenated along the feature axis. The metadata will be taken from the first file.

  • path_out – If None, then the dataset is created in memory. If True (default), create a file on disk. If a pathlib.Path is specified, the dataset is written to that file. Note that datasets in memory are likely not pickable (so don’t use them for multiprocessing).

  • compute_frame – Whether to compute the “events/frame” feature, taking the frame data from the input files and properly incrementing them along the file index.

  • features – List of features to take from the input files.

Notes

  • If one of the input files does not contain a feature from the first input paths, then a ValueError is raised. Use the features argument to specify which features you need instead.

  • Basins are not considered.

dcnum.read.get_mapping_indices(index_mapping: numbers.Integral | slice | list | numpy.ndarray)[source]

Return integer numpy array with mapping indices for a range

Parameters:

index_mapping (numbers.Integral | slice | list | np.ndarray) – Several options you have here: - integer: results in np.arrange(integer) - slice: results in np.arrange(slice.start, slice.stop, slice.step) - list or np.ndarray: returns the input as unit32 array

dcnum.read.get_mapped_object(obj, index_mapping=None)[source]