dcnum.read ========== .. py:module:: dcnum.read Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/dcnum/read/cache/index /autoapi/dcnum/read/const/index /autoapi/dcnum/read/detect_flicker/index /autoapi/dcnum/read/hdf5_concat/index /autoapi/dcnum/read/hdf5_data/index /autoapi/dcnum/read/mapped/index Attributes ---------- .. autoapisummary:: dcnum.read.PROTECTED_FEATURES Exceptions ---------- .. autoapisummary:: dcnum.read.BasinIdentifierMismatchError Classes ------- .. autoapisummary:: dcnum.read.HDF5Data dcnum.read.HDF5ImageCache Functions --------- .. autoapisummary:: dcnum.read.md5sum dcnum.read.detect_flickering dcnum.read.get_measurement_identifier dcnum.read.concatenated_hdf5_data dcnum.read.get_mapping_indices dcnum.read.get_mapped_object Package Contents ---------------- .. py:function:: md5sum(path, blocksize=65536, count=0) Compute (partial) MD5 sum of a file :param path: path to the file :type path: str or pathlib.Path :param blocksize: block size in bytes read from the file (set to `0` to hash the entire file) :type blocksize: int :param count: number of blocks read from the file :type count: int .. py:data:: PROTECTED_FEATURES :value: ['bg_off', 'flow_rate', 'frame', 'g_force', 'pressure', 'temp', 'temp_amb', 'time'] Frame-defined scalar features. Scalar features that apply to all events in a frame and which are not computed for individual events .. py:function:: detect_flickering(image_data: numpy.ndarray | dcnum.read.hdf5_data.HDF5Data, roi_height: int = 10, brightness_threshold: float = 2.5, count_threshold: int = 5, max_frames: int = 1000) Determine whether an image series experiences flickering Flickering is an unwelcome phenomenon due to a faulty data acquisition device. For instance, if there is random voltage noise in the electronics managing the LED power, then the brightness of the LED will vary randomly when the noise signal overlaps with the flash triggering signal. If flickering is detected, you should use the "sparsemed" background computation with ``offset_correction`` set to True. :param image_data: sliceable object (e.g. numpy array or HDF5Data) containing image data. :param roi_height: height of the ROI in pixels for which to search for flickering; the entire width of the image is used :type roi_height: int :param brightness_threshold: brightness difference between individual ROIs median and median of all ROI medians leading to a positive flickering event :type brightness_threshold: float :param count_threshold: minimum number of flickering events that would lead to a positive flickering decision :type count_threshold: int :param max_frames: maximum number of frames to include in the flickering analysis :type max_frames: int .. py:class:: HDF5Data(path: pathlib.Path | dcnum.common.h5py.File | BinaryIO, pixel_size: float | None = None, md5_5m: str | None = None, meta: dict | None = None, basins: list[dict[str, list[str] | str]] | None = None, logs: dict[str, list[str]] | None = None, tables: dict[str, numpy.ndarray] | None = None, image_cache_size: int = 2, image_chunk_size: int = 1000, index_mapping: int | slice | list | numpy.ndarray | None = None) :param path: path to data file :param pixel_size: pixel size in µm :param md5_5m: MD5 sum of the first 5 MiB; computed if not provided :param meta: metadata dictionary; extracted from HDF5 attributes if not provided :param basins: list of basin dictionaries; extracted from HDF5 attributes if not provided :param logs: dictionary of logs; extracted from HDF5 attributes if not provided :param tables: dictionary of tables; extracted from HDF5 attributes if not provided :param image_cache_size: size of the image cache to use when accessing image data :param image_chunk_size: maximum number of images in each image cache chunk :param index_mapping: select only a subset of input events, transparently reducing the size of the dataset, possible data types are - int `N`: use the first `N` events - slice: use the events defined by a slice - list: list of integers specifying the event indices to use Numpy indexing rules apply. E.g. to only process the first 100 events, set this to `100` or `slice(0, 100)`. .. py:method:: __contains__(item) .. py:method:: __enter__() .. py:method:: __exit__(exc_type, exc_val, exc_tb) .. py:method:: __getitem__(feat) .. py:method:: __getstate__() .. py:method:: __setstate__(state) .. py:method:: __len__() .. py:property:: h5 .. py:property:: image :type: dcnum.read.cache.HDF5ImageCache | None .. py:property:: image_bg :type: dcnum.read.cache.HDF5ImageCache | None .. py:property:: image_corr :type: dcnum.read.cache.ImageCorrCache | None .. py:property:: image_num_chunks Number of image chunks given `self.image_chunk_size` .. py:property:: mask .. py:property:: meta_nest Return `self.meta` as nested dicitonary This gets very close to the dclab `config` property of datasets. .. py:property:: pixel_size .. py:method:: extract_basin_dicts(h5, check=True) :staticmethod: Return list of basin dictionaries .. py:property:: features_scalar_frame Scalar features that apply to all events in a frame This is a convenience function for copying scalar features over to new processed datasets. Return a list of all features that describe a frame (e.g. temperature or time). .. py:method:: close() Close the underlying HDF5 file .. py:method:: get_ppid() .. py:method:: get_ppid_code() :classmethod: .. py:method:: get_ppid_from_ppkw(kwargs) :classmethod: .. py:method:: get_ppid_index_mapping(index_mapping) :staticmethod: Return the pipeline identifier part for index mapping .. py:method:: get_ppkw_from_ppid(dat_ppid) :staticmethod: .. py:method:: get_basin_data(index: int) -> tuple[dcnum.common.h5py.Group, list, int | slice | list | numpy.ndarray] Return HDF5Data info for a basin index in `self.basins` :param index: index of the basin from which to get data :type index: int :returns: * **group** (*h5py.Group*) -- HDF5 group containing HDF5 Datasets with the names listed in `features` * **features** (*list of str*) -- list of features made available by this basin * *index_mapping* -- a mapping (see `__init__`) that defines mapping from the basin dataset to the referring dataset .. py:method:: _get_basin_data_file(bn_dict) .. py:method:: _get_basin_data_internal(bn_dict) .. py:method:: get_image_cache(feat) Create an HDF5ImageCache object for the current dataset This method also tries to find image data in `self.basins`. .. py:method:: keys() .. py:class:: HDF5ImageCache(h5ds: dcnum.common.h5py.Dataset | dcnum.read.mapped.MappedHDF5Dataset, chunk_size: int = 1000, cache_size: int = 2, boolean: bool = False) Bases: :py:obj:`BaseImageChunkCache` An HDF5 image cache Deformability cytometry data files commonly contain image stacks that are chunked in various ways. Loading just a single image can be time-consuming, because an entire HDF5 chunk has to be loaded, decompressed and from that one image extracted. The `HDF5ImageCache` class caches the chunks from the HDF5 files into memory, making single-image-access very fast. .. py:attribute:: h5ds .. py:attribute:: boolean :value: False .. py:method:: _get_chunk_data(chunk_slice) Implemented in subclass to obtain actual data .. py:function:: get_measurement_identifier(h5: dcnum.common.h5py.Group) -> str | None Return the measurement identifier for the given H5File object The basin identifier is taken from the HDF5 attributes. If the "experiment:run identifier" attribute is not set, it is computed from the HDF5 attributes "experiment:time", "experiment:date", and "setup:identifier". If the measurement identifier cannot be found or computed, return None. .. py:exception:: BasinIdentifierMismatchError Bases: :py:obj:`BaseException` Initialize self. See help(type(self)) for accurate signature. .. py:function:: concatenated_hdf5_data(paths: list[pathlib.Path], path_out: bool | pathlib.Path | None = True, compute_frame: bool = True, features: list[str] | None = None) Return a virtual dataset concatenating all the input paths :param paths: Path of the input HDF5 files that will be concatenated along the feature axis. The metadata will be taken from the first file. :param path_out: If `None`, then the dataset is created in memory. If `True` (default), create a file on disk. If a pathlib.Path is specified, the dataset is written to that file. Note that datasets in memory are likely not pickable (so don't use them for multiprocessing). :param compute_frame: Whether to compute the "events/frame" feature, taking the frame data from the input files and properly incrementing them along the file index. :param features: List of features to take from the input files. .. rubric:: Notes - If one of the input files does not contain a feature from the first input `paths`, then a `ValueError` is raised. Use the `features` argument to specify which features you need instead. - Basins are not considered. .. py:function:: get_mapping_indices(index_mapping: numbers.Integral | slice | list | numpy.ndarray) Return integer numpy array with mapping indices for a range :param index_mapping: Several options you have here: - integer: results in np.arrange(integer) - slice: results in np.arrange(slice.start, slice.stop, slice.step) - list or np.ndarray: returns the input as unit32 array :type index_mapping: numbers.Integral | slice | list | np.ndarray .. py:function:: get_mapped_object(obj, index_mapping=None)