Breadcrumb navigation

DPS4HPC

NEC Data Provenance System for HPCRun jobs as usual. Know everything that happened.

The Challenge

Ensuring the reliability and reproducibility of HPC simulation results has become a critical requirement in both academic and industrial research. Yet in practice, the process that produced a result is often difficult to reconstruct — manual recording is time-consuming, incomplete, and prone to human error.

The Challenge What DPS4HPC Delivers
Information required to reproduce computational results is often not preserved — input data and processing steps cannot be traced after execution Automatically captures the full relationships between input data, processing steps, and outputs — complete traceability with zero additional effort
Computational workflows become black boxes over time, and data management relies heavily on manual, error-prone effort Automates provenance recording — DPS4HPC captures provenance automatically as jobs run, reducing human error and administrative burden
Reproducing past experiments or sharing computational processes with reviewers and collaborators is difficult and time-consuming Enables accurate reproduction of past workflows using recorded provenance data — supporting peer review, validation, and seamless knowledge transfer

Use Cases

DPS4HPC addresses the most common reproducibility and data management challenges in computational research environments.

Use Case Research Setting What DPS4HPC Enables
Identifying Data to Preserve Research generating large numbers of intermediate files across multiple experimental runs Provenance graph shows exactly which inputs contributed to final results — preserve only what matters
Reproducing Experiments Peer review, validation, and knowledge transfer when team members change Past experiments can be accurately reproduced using automatically recorded job-level provenance data
Audit & Compliance Research subject to funding body requirements, institutional governance, or international standards Complete, tamper-evident record of how results were produced — ready for review at any time

How It Works

DPS4HPC fits into your existing HPC workflow without any changes. Users simply log in and submit jobs as usual — the system captures provenance automatically in the background.

Features

DPS4HPC captures, stores, and visualizes provenance information automatically — without requiring any modification to how researchers run their jobs.

Zero-Effort Provenance Capture

  • Users log in and run jobs as usual — no changes to job scripts or applications required
  • The system automatically captures job-level information and file relationships in the background

 

Comprehensive Job-Level Recording

  • Integrates with major job schedulers including SLURM, PBS Pro, and NQS V to capture job metadata alongside system-level provenance
  • Consistently records the relationships between input data, computational processes, and output data

 

Search and Visualize Provenance Data

  • Web-based management GUI enables easy access, visualization, and sharing of provenance information
  • Search provenance records by job ID, filename, user ID, execution time, and more
  • Relationships visualized as an interactive provenance graph

 

Reproducibility and Collaboration

  • Provenance information can be shared with reviewers and collaborators via GUI
  • Supports open science and FAIR data principles — making research Findable, Accessible, Interoperable, and Reusable

 

* FAIR (Findable, Accessible, Interoperable, Reusable) is a set of guiding principles for research data management, widely adopted by funding bodies and research institutions worldwide.

Architecture

DPS4HPC consists of the following components, working together with your job scheduler to capture, store, and visualize provenance information.

Job Scheduler Captures job-level metadata via a wrapper script integrated with the scheduler
Tracer Captures process execution details and file I/O activity on compute nodes with minimal performance impact
Aggregator Aggregates and correlates file I/O activity, process execution metadata, and job-level metadata to generate structured provenance records
Provenance Database Stores provenance data and relationships for search, retrieval, and long-term access
Management GUI Provides a web-based interface for search, graph visualization, and sharing of provenance information

Contact

To learn more about DPS4HPC or discuss how NEC can support reproducibility and data governance at your institution, please get in touch.