dcnum.read
==========

.. py:module:: dcnum.read


Submodules
----------

.. toctree::
   :maxdepth: 1

   /autoapi/dcnum/read/cache/index
   /autoapi/dcnum/read/const/index
   /autoapi/dcnum/read/detect_flicker/index
   /autoapi/dcnum/read/hdf5_concat/index
   /autoapi/dcnum/read/hdf5_data/index
   /autoapi/dcnum/read/mapped/index


Attributes
----------

.. autoapisummary::

   dcnum.read.PROTECTED_FEATURES


Exceptions
----------

.. autoapisummary::

   dcnum.read.BasinIdentifierMismatchError


Classes
-------

.. autoapisummary::

   dcnum.read.HDF5Data
   dcnum.read.HDF5ImageCache


Functions
---------

.. autoapisummary::

   dcnum.read.md5sum
   dcnum.read.detect_flickering
   dcnum.read.get_measurement_identifier
   dcnum.read.concatenated_hdf5_data
   dcnum.read.get_mapping_indices
   dcnum.read.get_mapped_object


Package Contents
----------------

.. py:function:: md5sum(path, blocksize=65536, count=0)

   Compute (partial) MD5 sum of a file

   :param path: path to the file
   :type path: str or pathlib.Path
   :param blocksize: block size in bytes read from the file
                     (set to `0` to hash the entire file)
   :type blocksize: int
   :param count: number of blocks read from the file
   :type count: int


.. py:data:: PROTECTED_FEATURES
   :value: ['bg_off', 'flow_rate', 'frame', 'g_force', 'pressure', 'temp', 'temp_amb', 'time']


   Frame-defined scalar features.
   Scalar features that apply to all events in a frame and which are
   not computed for individual events

.. py:function:: detect_flickering(image_data: numpy.ndarray | dcnum.read.hdf5_data.HDF5Data, roi_height: int = 10, brightness_threshold: float = 2.5, count_threshold: int = 5, max_frames: int = 1000)

   Determine whether an image series experiences flickering

   Flickering is an unwelcome phenomenon due to a faulty data
   acquisition device. For instance, if there is random voltage noise in
   the electronics managing the LED power, then the brightness of the
   LED will vary randomly when the noise signal overlaps with the flash
   triggering signal.

   If flickering is detected, you should use the "sparsemed" background
   computation with ``offset_correction`` set to True.

   :param image_data: sliceable object (e.g. numpy array or HDF5Data) containing
                      image data.
   :param roi_height: height of the ROI in pixels for which to search for flickering;
                      the entire width of the image is used
   :type roi_height: int
   :param brightness_threshold: brightness difference between individual ROIs median and median
                                of all ROI medians leading to a positive flickering event
   :type brightness_threshold: float
   :param count_threshold: minimum number of flickering events that would lead to a positive
                           flickering decision
   :type count_threshold: int
   :param max_frames: maximum number of frames to include in the flickering analysis
   :type max_frames: int


.. py:class:: HDF5Data(path: pathlib.Path | dcnum.common.h5py.File | BinaryIO, pixel_size: float | None = None, md5_5m: str | None = None, meta: dict | None = None, basins: list[dict[str, list[str] | str]] | None = None, logs: dict[str, list[str]] | None = None, tables: dict[str, numpy.ndarray] | None = None, image_cache_size: int = 2, image_chunk_size: int = 1000, index_mapping: int | slice | list | numpy.ndarray | None = None)

   :param path: path to data file
   :param pixel_size: pixel size in µm
   :param md5_5m: MD5 sum of the first 5 MiB; computed if not provided
   :param meta: metadata dictionary; extracted from HDF5 attributes
                if not provided
   :param basins: list of basin dictionaries; extracted from HDF5 attributes
                  if not provided
   :param logs: dictionary of logs; extracted from HDF5 attributes
                if not provided
   :param tables: dictionary of tables; extracted from HDF5 attributes
                  if not provided
   :param image_cache_size: size of the image cache to use when accessing image data
   :param image_chunk_size: maximum number of images in each image cache chunk
   :param index_mapping: select only a subset of input events, transparently reducing the
                         size of the dataset, possible data types are
                         - int `N`: use the first `N` events
                         - slice: use the events defined by a slice
                         - list: list of integers specifying the event indices to use
                         Numpy indexing rules apply. E.g. to only process the first
                         100 events, set this to `100` or `slice(0, 100)`.


   .. py:method:: __contains__(item)


   .. py:method:: __enter__()


   .. py:method:: __exit__(exc_type, exc_val, exc_tb)


   .. py:method:: __getitem__(feat)


   .. py:method:: __getstate__()


   .. py:method:: __setstate__(state)


   .. py:method:: __len__()


   .. py:property:: h5


   .. py:property:: image
      :type: dcnum.read.cache.HDF5ImageCache | None


   .. py:property:: image_bg
      :type: dcnum.read.cache.HDF5ImageCache | None


   .. py:property:: image_corr
      :type: dcnum.read.cache.ImageCorrCache | None


   .. py:property:: image_num_chunks

      Number of image chunks given `self.image_chunk_size`


   .. py:property:: mask


   .. py:property:: meta_nest

      Return `self.meta` as nested dicitonary

      This gets very close to the dclab `config` property of datasets.


   .. py:property:: pixel_size


   .. py:method:: extract_basin_dicts(h5, check=True)
      :staticmethod:


      Return list of basin dictionaries


   .. py:property:: features_scalar_frame

      Scalar features that apply to all events in a frame

      This is a convenience function for copying scalar features
      over to new processed datasets. Return a list of all features
      that describe a frame (e.g. temperature or time).


   .. py:method:: close()

      Close the underlying HDF5 file


   .. py:method:: get_ppid()


   .. py:method:: get_ppid_code()
      :classmethod:


   .. py:method:: get_ppid_from_ppkw(kwargs)
      :classmethod:


   .. py:method:: get_ppid_index_mapping(index_mapping)
      :staticmethod:


      Return the pipeline identifier part for index mapping


   .. py:method:: get_ppkw_from_ppid(dat_ppid)
      :staticmethod:


   .. py:method:: get_basin_data(index: int) -> tuple[dcnum.common.h5py.Group, list, int | slice | list | numpy.ndarray]

      Return HDF5Data info for a basin index in `self.basins`

      :param index: index of the basin from which to get data
      :type index: int

      :returns: * **group** (*h5py.Group*) -- HDF5 group containing HDF5 Datasets with the names
                  listed in `features`
                * **features** (*list of str*) -- list of features made available by this basin
                * *index_mapping* -- a mapping (see `__init__`) that defines mapping from
                  the basin dataset to the referring dataset


   .. py:method:: _get_basin_data_file(bn_dict)


   .. py:method:: _get_basin_data_internal(bn_dict)


   .. py:method:: get_image_cache(feat)

      Create an HDF5ImageCache object for the current dataset

      This method also tries to find image data in `self.basins`.


   .. py:method:: keys()


.. py:class:: HDF5ImageCache(h5ds: dcnum.common.h5py.Dataset | dcnum.read.mapped.MappedHDF5Dataset, chunk_size: int = 1000, cache_size: int = 2, boolean: bool = False)

   Bases: :py:obj:`BaseImageChunkCache`


   An HDF5 image cache

   Deformability cytometry data files commonly contain image stacks
   that are chunked in various ways. Loading just a single image
   can be time-consuming, because an entire HDF5 chunk has to be
   loaded, decompressed and from that one image extracted. The
   `HDF5ImageCache` class caches the chunks from the HDF5 files
   into memory, making single-image-access very fast.


   .. py:attribute:: h5ds


   .. py:attribute:: boolean
      :value: False


   .. py:method:: _get_chunk_data(chunk_slice)

      Implemented in subclass to obtain actual data


.. py:function:: get_measurement_identifier(h5: dcnum.common.h5py.Group) -> str | None

   Return the measurement identifier for the given H5File object

   The basin identifier is taken from the HDF5 attributes. If the
   "experiment:run identifier" attribute is not set, it is
   computed from the HDF5 attributes "experiment:time",
   "experiment:date", and "setup:identifier".

   If the measurement identifier cannot be found or computed,
   return None.


.. py:exception:: BasinIdentifierMismatchError

   Bases: :py:obj:`BaseException`


   Initialize self.  See help(type(self)) for accurate signature.


.. py:function:: concatenated_hdf5_data(paths: list[pathlib.Path], path_out: bool | pathlib.Path | None = True, compute_frame: bool = True, features: list[str] | None = None)

   Return a virtual dataset concatenating all the input paths

   :param paths: Path of the input HDF5 files that will be concatenated along
                 the feature axis. The metadata will be taken from the first
                 file.
   :param path_out: If `None`, then the dataset is created in memory. If `True`
                    (default), create a file on disk. If a pathlib.Path is specified,
                    the dataset is written to that file. Note that datasets in memory
                    are likely not pickable (so don't use them for multiprocessing).
   :param compute_frame: Whether to compute the "events/frame" feature, taking the frame
                         data from the input files and properly incrementing them along
                         the file index.
   :param features: List of features to take from the input files.

   .. rubric:: Notes

   - If one of the input files does not contain a feature from the first
     input `paths`, then a `ValueError` is raised. Use the `features`
     argument to specify which features you need instead.
   - Basins are not considered.


.. py:function:: get_mapping_indices(index_mapping: numbers.Integral | slice | list | numpy.ndarray)

   Return integer numpy array with mapping indices for a range

   :param index_mapping: Several options you have here:
                         - integer: results in np.arrange(integer)
                         - slice: results in np.arrange(slice.start, slice.stop, slice.step)
                         - list or np.ndarray: returns the input as  unit32 array
   :type index_mapping: numbers.Integral | slice | list | np.ndarray


.. py:function:: get_mapped_object(obj, index_mapping=None)