dcnum.logic.job
Attributes
Classes
Pipeline job recipe |
Module Contents
- dcnum.logic.job.hdf5plugin
- class dcnum.logic.job.DCNumPipelineJob(path_in: pathlib.Path | str, path_out: pathlib.Path | str | None = None, data_code: str = 'hdf', data_kwargs: dict | None = None, background_code: str = 'sparsemed', background_kwargs: dict | None = None, segmenter_code: str = 'thresh', segmenter_kwargs: dict | None = None, feature_code: str = 'legacy', feature_kwargs: dict | None = None, gate_code: str = 'norm', gate_kwargs: dict | None = None, basin_strategy: Literal['drain', 'tap'] = 'drain', compression: str = 'zstd-5', num_procs: int | None = None, log_level: int = logging.INFO, debug: bool = False)[source]
Pipeline job recipe
- Parameters:
path_in (pathlib.Path | str) – input data path
path_out (pathlib.Path | str) – output data path
data_code (str) – identification code of input data reader to use
data_kwargs (dict) – keyword arguments for data reader
background_code (str) – identification code of background data computation method
background_kwargs (dict) – keyword arguments for background data computation method
segmenter_code (str) – identification code of segmenter to use
segmenter_kwargs (dict) – keyword arguments for segmenter
feature_code (str) – identification code of feature extractor
feature_kwargs (dict) – keyword arguments for feature extractor
gate_code (str) – identification code for gating/event filtering class
gate_kwargs (dict) – keyword arguments for gating/event filtering class
basin_strategy (str) –
strategy on how to handle event data; In principle, not all events have to be stored in the output file if basins are defined, linking back to the original file.
You can “drain” all basins which means that the output file will contain all features, but will also be very big.
You can “tap” the basins, including the input file, which means that the output file will be comparatively small.
compression (str) – compression algorithm to use; Set this to “none” to disable compression. Currently, only the Zstandard compression algorithm may be used, with the least compression “zstd-1” and the best compression “zstd-9”. The default “zstd-5” is a trade-off. Set the compression to a higher number if the bottleneck is disk-IO. Set the compression to a lower number if the bottleneck is the CPU. Note that “zstd-5” is the accepted minimum compression setting for long-term data storage in the DC universe (enforced e.g. by DCOR-Aid).
num_procs (int) – Number of processes to use
log_level (int) – Logging level to use.
debug (bool) – Whether to set logging level to “DEBUG” and use threads instead of processes
- kwargs
initialize keyword arguments for this job