NSF Compute On Data Path Project All-Hands Meeting


2016 Meeting

The 2016 NSF Compute On Data Path Project All-Hands Meeting was held on Sep. 30th, 2016.

Agenda:

Time Arrangement
8:30 - 9:10AM Title : Increasing Computational Asynchrony in OpenSHMEM with Active Messages
Speaker : Siddhartha Jana & Dounia Khaldi (Stony Brook University)
Abstract :
Recent reports on challenges of programming models at extreme
scale suggest a shift from traditional block synchronous execution
models to those that support more asynchronous behavior. The
OpenSHMEM programming model enables HPC programmers to exploit
underlying network capabilities while designing asynchronous communication
patterns. The strength of its communication model is fully realized
when these patterns are characterized with small low-latency data transfers.
However, for cases with large data payloads coupled with insu cient
computation overlap, OpenSHMEM programs su er from underutilized
CPU cycles.
In order to tackle the above challenges, this paper explores the feasibility
of introducing Active Messages in the OpenSHMEM model. Active
Messages is a well established programming paradigm that enables a
process to trigger execution of computation units on remote processes.
Using empirical analyses, we show that this approach of moving computation
closer to data provides a mechanism for OpenSHMEM applications
to avoid the latency costs associated with bulk data transfers. In
addition, this programming pattern helps reduce the need for unwanted
synchronization among processes, thereby exploiting more asynchrony
within an algorithm. As part of this preliminary work, we propose an
API that supports the use of Active Messages within the OpenSHMEM
execution model. We present a microbenchmark-based performance evaluation
of our prototype implementation. We also compare the execution
of a Traveling-Salesman Problem designed with and without Active Messages.
Our experiments indicate promising bene ts at scale.
9:10 - 9:30AM Q&A and Break
9:30 - 10:10AM Title : Visualization of Data Layout and Access of Parallel Program for Productive
Performance Analysis and Tuning
Speaker : Yonghong Yan and Aditi Pati (Oakland University)
Abstract :
Current performance tools such as TAU, Vampire, Paraver, Jumpshot,
Scalasca, Peekperf, EXPERT performance-analysis environment, Cilkview,
HPCToolkit, etc. provide measurement and visualization of performance
and scalability of parallel program execution to help users for
performance analysis and tuning. They, however, do not provide enough
and intuitive insight on how data are layer-out and accessed during
parallel execution, thus relying on users’ expertise to manually
diagnose issues related to memory access, such as shared cache
contention, false sharing and memory bandwidth optimization.
We propose a visualization tool dealing with displaying data layout and
parallel program access in a clear picture of array distribution and
computation distribution, maps of program data to the physical NUMA
memory region, pattern of memory access, contention on memory bandwidth
and shared cache (bandwidth or size). Visualization of data layout will
make user sound of peak stack or heap memory usage and peak read/write
memory bandwidth contention. Location of memory allocation is a critical
factor as NUMA or cache coherence effect can hugely affect the
performance of the computation. In order to tackle this challenge,
visualization of memory location will help the user to identify this
bottleneck for the performance issue. To get started, we are thinking of
pictorial presentation of 1) program data access graph and 2) data
layout/access in memory layout for different programs to get clear
insight of the memory usage. This will allow us to find a solution for
the problem with a greater ease.
10:10 - 10:30AM Q&A and Break
10:30 - 11:30AM Programming Model Task and Runtime System Task Discussion
11:30-1:00PMLUNCH & BREAK
1:00 - 1:40PM Title : From File Systems to Services: Changing the Data Management Model in HPC
Speaker : Robert Ross (Northwest University)
Abstract :
HPC applications are composed from software components that provide only
the communication, concurrency, and synchronization needed for the task
at hand. In contrast, parallel file systems are kernel resident, fully
consistent services with semantic obligations developed on single core
machines 50 years ago; parallel file systems are old-fashioned system
services forced to scale as fast as the HPC system. Rather than the
monolithic storage services seen today, we envision an ecosystem of
services being composed to meet the specific needs of science activities
at extreme scale. In fact, a nascent ecosystem of services is present
today. In this talk we will discuss drivers leading to this development,
some examples in existence today, and work we are undertaking to
accelerate the rate at which these services are developed and mature to
meet application needs.
1:40 - 2:00PMQ&A and Break
2:00 - 2:40PM Title : Exploit Locality in Scientific Workflow System
Speaker : Dong Dai (Texas Tech University)
Abstract :
Scientific applications running in HPC environment are more complex and
more data-intensive nowadays. Recently, we have introduced a new concept
of Compute-on-Data- Path to allow tasks and data binding to be more
efficient to reduce the data movement cost. On the other hand, workflow
system is typically used to manage the complexity in scientific
applications. Traditionally, those scientific workflow systems work with
parallel file systems, like Lustre, PVFS, Ceph, or other forms of remote
shared storage systems. As such, the data (including the intermediate
data generated during workflow execution) need to be transferred back
and forth between compute nodes and storage systems, which introduces a
significant performance bottleneck on I/O operations. One promising
solution to tackle this challenge is to exploit the data locality in HPC
storage hierarchy: if the datasets are stored in compute nodes, near the
workflow tasks, then the task can directly access them with better
performance and less network transmissions. In this research, we argue
that providing a compute- node side storage system is not sufficient to
fully exploit data locality. A cross-layer solution together with
storage system, compiler, and runtime is necessary. We take Swift/T [3],
a workflow system for data-intensive applications, as a prototype
platform to demonstrate such a cross-layer solution.
2:40 - 3:00PM Q&A and Break
3:00 - 4:00PM Storage System Task and Data Model Task Discussion
4:00 - 4:30PM Project Management, Logistics, and Open Discussion; Close of meeting

2015 Meeting

The 2015 NSF Compute On Data Path Project All-Hands Meeting was held on Oct. 8 - 9, 2015 at Texas Tech University, Lubbock, Texas.

Agenda:

First Day: 2015 Oct. 8th, Thursday
Location: TTU computer science department conference room, engineering center 206 (a map) (unless noted otherwise)

Time Arrangement
Noon - 1:30pm Lunch Texas Tech Club
1:30 - 1:40pm Opening Remarks (Yong Chen from TTU)
1:40 - 2:20pm Title : Graph-based Rich Metadata Management in HPC Environment
Speaker : Dong Dai (Texas Tech University)
Abstract: In the approaching Exascale era, high performance computing (HPC) systems will
face critical challenges on managing rich metadata. The challenges not only
come from the exploding size of metadata, but also come from the
fact that rich metadata, including provenance, data lineage, and arbitrary
user-defined attributes are becoming necessary to support more advanced
management functionality in future HPC systems. In this research, we propose
unifying heterogeneous metadata entities from HPC systems into a graph-based
abstraction. We identify the challenges on the underlying infrastructure and
the limitations of existing solutions. We introduce GraphMeta, a graph-based
rich metadata management prototype for HPC systems. It contains flexible APIs
for different types of metadata, provides scalable and comparable
read/write performance with existing cutting-edge metadata systems, and
contains an asynchronous graph traversal engine to support advanced metadata management
functionality. We evaluate GraphMeta under typical HPC use cases to compare
with other approaches and demonstrate its advantages both on efficiency and
usability for metadata management in next generation parallel systems.
2:20 - 3:00pm Title : Exploring Data Locality Challenges in HPC Programming Models
Speaker: Dounia Khaldi (University of Houston)
Abstract: The emergence of manycore processors with shared caches, deep memory hierarchy
and non-uniform memory access traits causes the thread and memory affinities
to become key indicators of a system performance. The proper handling of such
affinities is performance-wise primary due to the complexity present in
parallel applications in terms of irregular and unpredictable data access
patterns. In fact, demand is increasing for optimizing the use of resources
in order to improve memory bandwidth and decrease memory contention and data
access latency. In this presentation, we study different memory affinity libraries
and different data placement strategies in order to ultimately extend OpenMP,
the de-facto standard among the different available multithreaded languages
to program NUMA architectures, with support for data and task affinities. We
also touch on OpenSHMEM, a library interface standard that follows the Partitioned
Global Address Space (PGAS) paradigm, and how these HPC models can interact with the
big data world.
3:00 - 3:30pm Tour to HPCC for TTU and DISCL@TTU HPC research infrastructure
3:30 - 3:50pm Coffee Break
3:50 - 4:30pm Title : The Damaris In Situ Data Management System, and The IOlogy I/O Mining Framework
Speaker: Matthieu Dorier (Argonne National Laboratory)
Abstract: This presentation will focus on two pieces of software potentially relevant to
the NSF Compute on Data Path project. Developed at Inria (France) since 2011,
Damaris enables the use of dedicated cores and dedicated nodes to offload I/O
and couple HPC simulations with analysis and visualization tools. It is at
the core of several collaborations between Inria and Argonne. Damaris has
been evaluated on several leadership machines, including Intrepid (ANL),
Kraken (NICS), and Blue Waters (NCSA), and benefitted codes such as the CM1
atmospheric simulation. IOlogy, on the other hand, is a framework developed at
ANL that aims at mining the behavior of HPC applications in order to optimize
their I/O. It provides tracing, modeling, prediction and extrapolation capabilities.
IOlogy is based on the Omnisc'IO approach, which relies on formal grammars and
stack walking to model the behavior of applications at run time.
4:30 - 5:10pm Title : Hierarchical Place Trees: Abstract Machine Model for Deep Memory Hierarchy and Heterogeneous Systems
Speaker: Yonghong Yan (Oakland University)
Abstract: Parallel computing systems have become far more complex than those from recent past.
Parallelism has been increased dramatically and specialized, and memory systems
become deeper in hierarchy (on- and off-chip and on- and off-system) and diversified
in size, type and speed in each level. Programmers are facing more challenges
than before to make effective and efficient uses of the computational capabilities
of those systems.
Abstract machine models (AMM) that represent the underlying platform and
capture the memory hierarchy (both inter-node and intra-node) and
architectural heterogeneity is an important means of dialogue between software
and hardware, application developers and tools, and compiler and runtime
systems. In this talk, we will present Hierarchy Place Trees (HPTs) model.
The HPTs model was originally developed for dealing with data affinity
between asynchronous tasks and data. This model will be accessible in a
portable manner by the programming model, compiler and runtime system. We will
discuss the use of HPTs as a portable representation of diverse computing
systems, including manycore architectures with deep memory hierarchies,
heterogeneous computing systems and their combinations. We will also present
our recent efforts of extending OpenMP for multiple accelerators and
highlight how those extensions could be used with HPT model.
5:10pm - 5:30pm Open discussion
5:30pm Dinner (Texas Roadhouse)

Second Day: 2015 Oct. 9th, Friday
Location: TTU computer science department conference room, engineering center 206 (a map) (unless noted otherwise)

Time Arrangement
8:30 - 9:00 Project Overview (Yong Chen from TTU)
9:00 - 9:30 Data Model Task Discussion (Team)
9:30 - 10:00 Programming Model Task Discussion (Team)
10:00 - 10:30 Coffee Break
10:30 - 11:00 Runtime System Task Discussion (Team)
11:00 - 11:30 Storage System Task Discussion (Team)
11:30 - 12:30 Project Management, Logistics, and Open Discussion
12:30 Close of meeting (Boxed lunches provided)

Gallery:

More photos can be found here.


2015 Nov. Meeting

NSF Compute On Data Path Project Meeting @ SC'15, on Novenmber 17, 2015