dcnum.read
Submodules
Attributes
Frame-defined scalar features. |
Exceptions
Initialize self. See help(type(self)) for accurate signature. |
Classes
An HDF5 image cache |
Functions
|
Compute (partial) MD5 sum of a file |
|
Determine whether an image series experiences flickering |
|
Return the measurement identifier for the given H5File object |
|
Return a virtual dataset concatenating all the input paths |
|
Return integer numpy array with mapping indices for a range |
|
Package Contents
- dcnum.read.md5sum(path, blocksize=65536, count=0)[source]
Compute (partial) MD5 sum of a file
- Parameters:
path (str or pathlib.Path) – path to the file
blocksize (int) – block size in bytes read from the file (set to 0 to hash the entire file)
count (int) – number of blocks read from the file
- dcnum.read.PROTECTED_FEATURES = ['bg_off', 'flow_rate', 'frame', 'g_force', 'pressure', 'temp', 'temp_amb', 'time']
Frame-defined scalar features. Scalar features that apply to all events in a frame and which are not computed for individual events
- dcnum.read.detect_flickering(image_data: numpy.ndarray | dcnum.read.hdf5_data.HDF5Data, roi_height: int = 10, brightness_threshold: float = 2.5, count_threshold: int = 5, max_frames: int = 1000)[source]
Determine whether an image series experiences flickering
Flickering is an unwelcome phenomenon due to a faulty data acquisition device. For instance, if there is random voltage noise in the electronics managing the LED power, then the brightness of the LED will vary randomly when the noise signal overlaps with the flash triggering signal.
If flickering is detected, you should use the “sparsemed” background computation with
offset_correctionset to True.- Parameters:
image_data – sliceable object (e.g. numpy array or HDF5Data) containing image data.
roi_height (int) – height of the ROI in pixels for which to search for flickering; the entire width of the image is used
brightness_threshold (float) – brightness difference between individual ROIs median and median of all ROI medians leading to a positive flickering event
count_threshold (int) – minimum number of flickering events that would lead to a positive flickering decision
max_frames (int) – maximum number of frames to include in the flickering analysis
- class dcnum.read.HDF5Data(path: pathlib.Path | dcnum.common.h5py.File | BinaryIO, pixel_size: float | None = None, md5_5m: str | None = None, meta: dict | None = None, basins: list[dict[str, list[str] | str]] | None = None, logs: dict[str, list[str]] | None = None, tables: dict[str, numpy.ndarray] | None = None, image_cache_size: int = 2, image_chunk_size: int = 1000, index_mapping: int | slice | list | numpy.ndarray | None = None)[source]
- Parameters:
path – path to data file
pixel_size – pixel size in µm
md5_5m – MD5 sum of the first 5 MiB; computed if not provided
meta – metadata dictionary; extracted from HDF5 attributes if not provided
basins – list of basin dictionaries; extracted from HDF5 attributes if not provided
logs – dictionary of logs; extracted from HDF5 attributes if not provided
tables – dictionary of tables; extracted from HDF5 attributes if not provided
image_cache_size – size of the image cache to use when accessing image data
image_chunk_size – maximum number of images in each image cache chunk
index_mapping – select only a subset of input events, transparently reducing the size of the dataset, possible data types are - int N: use the first N events - slice: use the events defined by a slice - list: list of integers specifying the event indices to use Numpy indexing rules apply. E.g. to only process the first 100 events, set this to 100 or slice(0, 100).
- property h5
- property image: dcnum.read.cache.HDF5ImageCache | None
- property image_bg: dcnum.read.cache.HDF5ImageCache | None
- property image_corr: dcnum.read.cache.ImageCorrCache | None
- property image_num_chunks
Number of image chunks given self.image_chunk_size
- property mask
- property meta_nest
Return self.meta as nested dicitonary
This gets very close to the dclab config property of datasets.
- property pixel_size
- property features_scalar_frame
Scalar features that apply to all events in a frame
This is a convenience function for copying scalar features over to new processed datasets. Return a list of all features that describe a frame (e.g. temperature or time).
- static get_ppid_index_mapping(index_mapping)[source]
Return the pipeline identifier part for index mapping
- get_basin_data(index: int) tuple[dcnum.common.h5py.Group, list, int | slice | list | numpy.ndarray][source]
Return HDF5Data info for a basin index in self.basins
- Parameters:
index (int) – index of the basin from which to get data
- Returns:
group (h5py.Group) – HDF5 group containing HDF5 Datasets with the names listed in features
features (list of str) – list of features made available by this basin
index_mapping – a mapping (see __init__) that defines mapping from the basin dataset to the referring dataset
- class dcnum.read.HDF5ImageCache(h5ds: dcnum.common.h5py.Dataset | dcnum.read.mapped.MappedHDF5Dataset, chunk_size: int = 1000, cache_size: int = 2, boolean: bool = False)[source]
Bases:
BaseImageChunkCacheAn HDF5 image cache
Deformability cytometry data files commonly contain image stacks that are chunked in various ways. Loading just a single image can be time-consuming, because an entire HDF5 chunk has to be loaded, decompressed and from that one image extracted. The HDF5ImageCache class caches the chunks from the HDF5 files into memory, making single-image-access very fast.
- h5ds
- boolean = False
- dcnum.read.get_measurement_identifier(h5: dcnum.common.h5py.Group) str | None[source]
Return the measurement identifier for the given H5File object
The basin identifier is taken from the HDF5 attributes. If the “experiment:run identifier” attribute is not set, it is computed from the HDF5 attributes “experiment:time”, “experiment:date”, and “setup:identifier”.
If the measurement identifier cannot be found or computed, return None.
- exception dcnum.read.BasinIdentifierMismatchError[source]
Bases:
BaseExceptionInitialize self. See help(type(self)) for accurate signature.
- dcnum.read.concatenated_hdf5_data(paths: list[pathlib.Path], path_out: bool | pathlib.Path | None = True, compute_frame: bool = True, features: list[str] | None = None)[source]
Return a virtual dataset concatenating all the input paths
- Parameters:
paths – Path of the input HDF5 files that will be concatenated along the feature axis. The metadata will be taken from the first file.
path_out – If None, then the dataset is created in memory. If True (default), create a file on disk. If a pathlib.Path is specified, the dataset is written to that file. Note that datasets in memory are likely not pickable (so don’t use them for multiprocessing).
compute_frame – Whether to compute the “events/frame” feature, taking the frame data from the input files and properly incrementing them along the file index.
features – List of features to take from the input files.
Notes
If one of the input files does not contain a feature from the first input paths, then a ValueError is raised. Use the features argument to specify which features you need instead.
Basins are not considered.
- dcnum.read.get_mapping_indices(index_mapping: numbers.Integral | slice | list | numpy.ndarray)[source]
Return integer numpy array with mapping indices for a range
- Parameters:
index_mapping (numbers.Integral | slice | list | np.ndarray) – Several options you have here: - integer: results in np.arrange(integer) - slice: results in np.arrange(slice.start, slice.stop, slice.step) - list or np.ndarray: returns the input as unit32 array