Graph-based Data Management System for High Performance Computing

The increasingly large, complex, and heterogeneous datasets that HPC and projected exascale systems have to deal with present fundamental challenges of efficiently managing data, both for domain scientists who want to understand and discover new knowledge from their simulations and experimental results, and for system administrators who need to monitor, archive, and secure the data. These needs and challenges demand a fundamental paradigm shift in storage systems and I/O.

We envision a data management system, beyond the traditional file system, that can store adequate metadata, maintain sufficient relationships among data producers and consumers, and provide a powerful new model beyond the aging POSIX. However, existing HPC systems still use traditional POSIX file systems to capture the basic and simple metadata like filename, permission, length, creation time, and fixed types of relationships between files and directories, files and users, users and groups, etc. Those limited metadata, limited relationships among producers and consumers, and the correspondingly limited access model are insufficient for many important storage system and data management functionalities, such as data organization, data audit, validation and reproducibility support. With increasing data demands for scientific applications, there is an imperative need of rethinking HPC storage systems and I/O with rich metadata functionalities and a shift away from the limited POSIX metadata model to a powerful new model to support scientific discovery and innovation highly efficiently.

To enable this vision and concept, we investigate a graph-based model that unifies rich metadata (which can even include user-defined information about a wide range of entities, such as users, applications, files, processes, jobs, etc.), manages complex relationships among entities (e.g., producers and consumers), and provides powerful new access modalities to scientists and administrators. Based on this model, We develop methodologies to store, access, and use such metadata graph highly efficiently. Such a graph-based model and metadata management can be combined with traditional file system functionality of a persistent store to deliver a new graph-based data management solution for HPC systems.

Initial Graph-based Model

We propose to develop a property graph based representation of metadata as a graph-based model that unifies rich metadata, manages complex relationships, and provides powerful new access modalities. A property graph typically looks like following figure.

Initial Mapping Strategy

A very initial mapping strategy:

  • Entity ⇒ Vertex
    • Data Object: represents the basic data unit in storage
    • Executions: represents applications including Jobs, Processes, Threads
    • User: represents real end user of a system
    • Users are allowed to define their own entities
  • Relationship ⇒ Edge
    • Relationships between different entities are mapped as edges
    • Users run Executions. An edge with type ‘Run’ is created between them
    • Reversed relationships are also defined
    • Users are allowed to define their own relationships
  • Attributes ⇒ Property
    • On both Entity and Relationship
    • Stored as Key-Value pairs attached on vertices and edges

Exemplar Mapping and Graph

To help understand the attributes of a rich metadata graph in an HPC context, we exploit Darshan trace logs as a source of rich metadata for prototyping purpose.

Each unique user ID indicates a User entity, each Darshan log file represents a Job, all the ranks inside a job correspond to the Processes, and both the executables and data files are abstracted as Data Object entities. Darshan does not capture directory structure, since it stores only the hashed value of file paths. We therefore synthetically create the simplest directory structures: data files visited by each execution are considered under the same directory, and all these directories accessed by one user are placed under one directory for each user (this structure is possible only in the graph model). Based on this mapping, we were able to import a year’s Darshan traces (2013) from the Intrepid machine into an example graph. This collection of Darshan logs characterizes the I/O activity of approximately 42% of all corehours consumed on Intrepid over the course of a year.

Initial Graph Characterization

Here are some statistics from this initial evaluation.

This table shows the graph size of our example metadata graph in terms of vertices and edges for different levels of detail. The first column considers the job as the Execution entity and eliminates all process (rank) information. All I/O behavior is associated with the Job entity. The second column (With I/O Ranks) records the ranks that have I/O operations. In many cases, this indicates the rank 0 process or the aggregators in two-phase I/O. The third column (With Full Ranks) records all the ranks as Processes entities whether they performed I/O or not. The last column shows the graph size with the complete synthetic directory structures and full ranks.

This figure shows the node degree distributions of four basic entities: User, Job, Process, and File (Data Object) in the metadata graphs. In these figures, the x-axis denotes the degree, and the y-axis shows the number of vertices that have that degree. Both the x-axis and y-axis are log10 values. All four entity types have the common attribute that most entities have very small degrees and a small number of entities have much larger degrees. Take User node as an example. A total of 177 users submitted jobs in 2013. Among them, around 10% of the users actually submitted more than 80% of all the jobs. We observe a similar phenomenon in the other three entities. They are described by using the skewed power-law degree distribution, which means that most vertices have relatively few neighbors, while a few vertices have many more neighbors. We use the function P(d) ~ d􀀀 ^(-a) to describe the probability that a vertex has degree d in such graphs. Here, d is a constant parameter of the distribution known as the exponent or scaling parameter.