Time and Location: 3/20/2019, Wed., 10:30 a.m., CS Conference Room 206. Speaker: Mr. Misha Ahmadian Title: Lightweight Job-Level I/O Monitoring Service for Lustre File System Abstract: In recent years, a significant growth in data processing demands and the emergence of more sophisticated scientific applications on HPC platforms have increased the file-level I/O operations on both local and shared file systems. As a result, HPC users and system administrators have been highly interested in collecting and analyzing I/O statistics of file operations at different granularity levels such as job, application, user or system. Therefore, system administrators utilize the collected I/O statistics in order to optimize HPC resource usage due to the high cost of capital, power, maintenance, and manpower. They would also exploit these data to improve the performance of users’ application by finding their I/O bottlenecks and inefficiencies. Likewise, collected I/O statistics can help users better understand what sort of access patterns are common in their application, how their application interacts with storage, and how their applications behave with respect to file I/O operations. Although many I/O monitoring and provenance tools have been developed to satisfy this requirement at a different level of granularities, most of them are not suitable for HPC environments. Since I/O monitoring and data provenance should have a low impact on compute nodes during the data collection, almost all these monitoring tools have been designed to be running as an agent on each server/node and collect desired statistics with various size of overhead. In this talk, we will introduce a new possible design and approach for collecting I/O statistics on Lustre file systems with a very trivial (almost zero) impact on HPC compute nodes. We will also discuss how this method could be enabled at different granularities such as user, application or job level.