dcnum.logic.job =============== .. py:module:: dcnum.logic.job Attributes ---------- .. autoapisummary:: dcnum.logic.job.hdf5plugin Classes ------- .. autoapisummary:: dcnum.logic.job.DCNumPipelineJob Module Contents --------------- .. py:data:: hdf5plugin .. py:class:: DCNumPipelineJob(path_in: pathlib.Path | str, path_out: pathlib.Path | str | None = None, data_code: str = 'hdf', data_kwargs: dict | None = None, background_code: str = 'sparsemed', background_kwargs: dict | None = None, segmenter_code: str = 'thresh', segmenter_kwargs: dict | None = None, feature_code: str = 'legacy', feature_kwargs: dict | None = None, gate_code: str = 'norm', gate_kwargs: dict | None = None, basin_strategy: Literal['drain', 'tap'] = 'drain', compression: str = 'zstd-5', num_procs: int | None = None, log_level: int = logging.INFO, debug: bool = False) Pipeline job recipe :param path_in: input data path :type path_in: pathlib.Path | str :param path_out: output data path :type path_out: pathlib.Path | str :param data_code: identification code of input data reader to use :type data_code: str :param data_kwargs: keyword arguments for data reader :type data_kwargs: dict :param background_code: identification code of background data computation method :type background_code: str :param background_kwargs: keyword arguments for background data computation method :type background_kwargs: dict :param segmenter_code: identification code of segmenter to use :type segmenter_code: str :param segmenter_kwargs: keyword arguments for segmenter :type segmenter_kwargs: dict :param feature_code: identification code of feature extractor :type feature_code: str :param feature_kwargs: keyword arguments for feature extractor :type feature_kwargs: dict :param gate_code: identification code for gating/event filtering class :type gate_code: str :param gate_kwargs: keyword arguments for gating/event filtering class :type gate_kwargs: dict :param basin_strategy: strategy on how to handle event data; In principle, not all events have to be stored in the output file if basins are defined, linking back to the original file. - You can "drain" all basins which means that the output file will contain all features, but will also be very big. - You can "tap" the basins, including the input file, which means that the output file will be comparatively small. :type basin_strategy: str :param compression: compression algorithm to use; Set this to "none" to disable compression. Currently, only the Zstandard compression algorithm may be used, with the least compression "zstd-1" and the best compression "zstd-9". The default "zstd-5" is a trade-off. Set the compression to a higher number if the bottleneck is disk-IO. Set the compression to a lower number if the bottleneck is the CPU. Note that "zstd-5" is the accepted minimum compression setting for long-term data storage in the DC universe (enforced e.g. by DCOR-Aid). :type compression: str :param num_procs: Number of processes to use :type num_procs: int :param log_level: Logging level to use. :type log_level: int :param debug: Whether to set logging level to "DEBUG" and use threads instead of processes :type debug: bool .. py:attribute:: kwargs initialize keyword arguments for this job .. py:method:: __getitem__(item) .. py:method:: __getstate__() .. py:method:: __setstate__(state) .. py:method:: assert_pp_codes() Sanity check of `self.kwargs` .. py:method:: get_hdf5_dataset_kwargs() -> dict Validate and return output HDF5 Dataset keyword arguments .. py:method:: get_ppid(ret_hash=False, ret_dict=False) .. py:method:: get_segmenter_class() Return the class of the segmenter associated with this job .. py:method:: validate() Make sure the pipeline will run given the job kwargs :returns: for testing convenience :rtype: True :raises dcnum.segm.SegmenterNotApplicableError:: the segmenter is incompatible with the input path