dcnum.read.hdf5_data

Exceptions

BasinIdentifierMismatchError

Initialize self. See help(type(self)) for accurate signature.

Classes

HDF5Data

Functions

concatenated_hdf5_data(*args, **kwargs)

get_measurement_identifier(→ str | None)

Return the measurement identifier for the given H5File object

Module Contents

exception dcnum.read.hdf5_data.BasinIdentifierMismatchError[source]

Bases: BaseException

Initialize self. See help(type(self)) for accurate signature.

class dcnum.read.hdf5_data.HDF5Data(path: pathlib.Path | dcnum.common.h5py.File | BinaryIO, pixel_size: float | None = None, md5_5m: str | None = None, meta: dict | None = None, basins: list[dict[str, list[str] | str]] | None = None, logs: dict[str, list[str]] | None = None, tables: dict[str, numpy.ndarray] | None = None, image_cache_size: int = 2, image_chunk_size: int = 1000, index_mapping: int | slice | list | numpy.ndarray | None = None)[source]
Parameters:
  • path – path to data file

  • pixel_size – pixel size in µm

  • md5_5m – MD5 sum of the first 5 MiB; computed if not provided

  • meta – metadata dictionary; extracted from HDF5 attributes if not provided

  • basins – list of basin dictionaries; extracted from HDF5 attributes if not provided

  • logs – dictionary of logs; extracted from HDF5 attributes if not provided

  • tables – dictionary of tables; extracted from HDF5 attributes if not provided

  • image_cache_size – size of the image cache to use when accessing image data

  • image_chunk_size – maximum number of images in each image cache chunk

  • index_mapping – select only a subset of input events, transparently reducing the size of the dataset, possible data types are - int N: use the first N events - slice: use the events defined by a slice - list: list of integers specifying the event indices to use Numpy indexing rules apply. E.g. to only process the first 100 events, set this to 100 or slice(0, 100).

__contains__(item)[source]
__enter__()[source]
__exit__(exc_type, exc_val, exc_tb)[source]
__getitem__(feat)[source]
__getstate__()[source]
__setstate__(state)[source]
__len__()[source]
property h5
property image: dcnum.read.cache.HDF5ImageCache | None
property image_bg: dcnum.read.cache.HDF5ImageCache | None
property image_corr: dcnum.read.cache.ImageCorrCache | None
property image_num_chunks

Number of image chunks given self.image_chunk_size

property mask
property meta_nest

Return self.meta as nested dicitonary

This gets very close to the dclab config property of datasets.

property pixel_size
static extract_basin_dicts(h5, check=True)[source]

Return list of basin dictionaries

property features_scalar_frame

Scalar features that apply to all events in a frame

This is a convenience function for copying scalar features over to new processed datasets. Return a list of all features that describe a frame (e.g. temperature or time).

close()[source]

Close the underlying HDF5 file

get_ppid()[source]
classmethod get_ppid_code()[source]
classmethod get_ppid_from_ppkw(kwargs)[source]
static get_ppid_index_mapping(index_mapping)[source]

Return the pipeline identifier part for index mapping

static get_ppkw_from_ppid(dat_ppid)[source]
get_basin_data(index: int) tuple[dcnum.common.h5py.Group, list, int | slice | list | numpy.ndarray][source]

Return HDF5Data info for a basin index in self.basins

Parameters:

index (int) – index of the basin from which to get data

Returns:

  • group (h5py.Group) – HDF5 group containing HDF5 Datasets with the names listed in features

  • features (list of str) – list of features made available by this basin

  • index_mapping – a mapping (see __init__) that defines mapping from the basin dataset to the referring dataset

_get_basin_data_file(bn_dict)[source]
_get_basin_data_internal(bn_dict)[source]
get_image_cache(feat)[source]

Create an HDF5ImageCache object for the current dataset

This method also tries to find image data in self.basins.

keys()[source]
dcnum.read.hdf5_data.concatenated_hdf5_data(*args, **kwargs)[source]
dcnum.read.hdf5_data.get_measurement_identifier(h5: dcnum.common.h5py.Group) str | None[source]

Return the measurement identifier for the given H5File object

The basin identifier is taken from the HDF5 attributes. If the “experiment:run identifier” attribute is not set, it is computed from the HDF5 attributes “experiment:time”, “experiment:date”, and “setup:identifier”.

If the measurement identifier cannot be found or computed, return None.