dcnum.logic.job

Attributes

hdf5plugin

Classes

DCNumPipelineJob

Pipeline job recipe

Module Contents

dcnum.logic.job.hdf5plugin

class dcnum.logic.job.DCNumPipelineJob(path_in: pathlib.Path | str, path_out: pathlib.Path | str | None = None, data_code: str = 'hdf', data_kwargs: dict | None = None, background_code: str = 'sparsemed', background_kwargs: dict | None = None, segmenter_code: str = 'thresh', segmenter_kwargs: dict | None = None, feature_code: str = 'legacy', feature_kwargs: dict | None = None, gate_code: str = 'norm', gate_kwargs: dict | None = None, basin_strategy: Literal['drain', 'tap'] = 'drain', compression: str = 'zstd-5', num_procs: int | None = None, log_level: int = logging.INFO, debug: bool = False)[source]

Pipeline job recipe

Parameters:

path_in (pathlib.Path | str) – input data path
path_out (pathlib.Path | str) – output data path
data_code (str) – identification code of input data reader to use
data_kwargs (dict) – keyword arguments for data reader
background_code (str) – identification code of background data computation method
background_kwargs (dict) – keyword arguments for background data computation method
segmenter_code (str) – identification code of segmenter to use
segmenter_kwargs (dict) – keyword arguments for segmenter
feature_code (str) – identification code of feature extractor
feature_kwargs (dict) – keyword arguments for feature extractor
gate_code (str) – identification code for gating/event filtering class
gate_kwargs (dict) – keyword arguments for gating/event filtering class
basin_strategy (str) –
strategy on how to handle event data; In principle, not all events have to be stored in the output file if basins are defined, linking back to the original file.
- You can “drain” all basins which means that the output file will contain all features, but will also be very big.
- You can “tap” the basins, including the input file, which means that the output file will be comparatively small.
compression (str) – compression algorithm to use; Set this to “none” to disable compression. Currently, only the Zstandard compression algorithm may be used, with the least compression “zstd-1” and the best compression “zstd-9”. The default “zstd-5” is a trade-off. Set the compression to a higher number if the bottleneck is disk-IO. Set the compression to a lower number if the bottleneck is the CPU. Note that “zstd-5” is the accepted minimum compression setting for long-term data storage in the DC universe (enforced e.g. by DCOR-Aid).
num_procs (int) – Number of processes to use
log_level (int) – Logging level to use.
debug (bool) – Whether to set logging level to “DEBUG” and use threads instead of processes

kwargs: initialize keyword arguments for this job

__getitem__(item)[source]

__getstate__()[source]

__setstate__(state)[source]

assert_pp_codes()[source]: Sanity check of self.kwargs

get_hdf5_dataset_kwargs() → dict[source]: Validate and return output HDF5 Dataset keyword arguments

get_ppid(ret_hash=False, ret_dict=False)[source]

get_segmenter_class()[source]: Return the class of the segmenter associated with this job

validate()[source]

Make sure the pipeline will run given the job kwargs

Returns:: for testing convenience
Return type:: True
Raises:: dcnum.segm.SegmenterNotApplicableError: – the segmenter is incompatible with the input path