dcnum.logic.job

Attributes

hdf5plugin

Classes

DCNumPipelineJob

Pipeline job recipe

Module Contents

dcnum.logic.job.hdf5plugin
class dcnum.logic.job.DCNumPipelineJob(path_in: pathlib.Path | str, path_out: pathlib.Path | str | None = None, data_code: str = 'hdf', data_kwargs: dict | None = None, background_code: str = 'sparsemed', background_kwargs: dict | None = None, segmenter_code: str = 'thresh', segmenter_kwargs: dict | None = None, feature_code: str = 'legacy', feature_kwargs: dict | None = None, gate_code: str = 'norm', gate_kwargs: dict | None = None, basin_strategy: Literal['drain', 'tap'] = 'drain', compression: str = 'zstd-5', num_procs: int | None = None, log_level: int = logging.INFO, debug: bool = False)[source]

Pipeline job recipe

Parameters:
  • path_in (pathlib.Path | str) – input data path

  • path_out (pathlib.Path | str) – output data path

  • data_code (str) – identification code of input data reader to use

  • data_kwargs (dict) – keyword arguments for data reader

  • background_code (str) – identification code of background data computation method

  • background_kwargs (dict) – keyword arguments for background data computation method

  • segmenter_code (str) – identification code of segmenter to use

  • segmenter_kwargs (dict) – keyword arguments for segmenter

  • feature_code (str) – identification code of feature extractor

  • feature_kwargs (dict) – keyword arguments for feature extractor

  • gate_code (str) – identification code for gating/event filtering class

  • gate_kwargs (dict) – keyword arguments for gating/event filtering class

  • basin_strategy (str) –

    strategy on how to handle event data; In principle, not all events have to be stored in the output file if basins are defined, linking back to the original file.

    • You can “drain” all basins which means that the output file will contain all features, but will also be very big.

    • You can “tap” the basins, including the input file, which means that the output file will be comparatively small.

  • compression (str) – compression algorithm to use; Set this to “none” to disable compression. Currently, only the Zstandard compression algorithm may be used, with the least compression “zstd-1” and the best compression “zstd-9”. The default “zstd-5” is a trade-off. Set the compression to a higher number if the bottleneck is disk-IO. Set the compression to a lower number if the bottleneck is the CPU. Note that “zstd-5” is the accepted minimum compression setting for long-term data storage in the DC universe (enforced e.g. by DCOR-Aid).

  • num_procs (int) – Number of processes to use

  • log_level (int) – Logging level to use.

  • debug (bool) – Whether to set logging level to “DEBUG” and use threads instead of processes

kwargs

initialize keyword arguments for this job

__getitem__(item)[source]
__getstate__()[source]
__setstate__(state)[source]
assert_pp_codes()[source]

Sanity check of self.kwargs

get_hdf5_dataset_kwargs() dict[source]

Validate and return output HDF5 Dataset keyword arguments

get_ppid(ret_hash=False, ret_dict=False)[source]
get_segmenter_class()[source]

Return the class of the segmenter associated with this job

validate()[source]

Make sure the pipeline will run given the job kwargs

Returns:

for testing convenience

Return type:

True

Raises:

dcnum.segm.SegmenterNotApplicableError: – the segmenter is incompatible with the input path