Data-Intensive Scalable Computing Laboratory (DISCL)

Research Projects

OpenSoC HPC: Open Source, Extensible High Performance Computing Platform

High performance computing platforms have become increasingly based upon commodity core processor architectures whose design is driven by major market requirements. The latest series of accelerators are also largely based upon instruction set architectures and core designs whose original purpose was to serve an orthogonal market such as gaming. This lack of specialization for high performance computing has lead to a decrease in operational efficiency for medium and large-scale system architectures.

The goal of the OpenSoC HPC project is to architect and demonstrate a core system architecture whose requirements are driven by traditional high performance computing applications. We utilize requirements such as low-latency, high bandwidth on-chip message passing and hardware support for partitioned global address space programming models to drive what will become the basis for future HPC-centric system architecture designs. At the core of the architecture is a RISC-V Rocket core coupled to a Hybrid Memory Cube stacked memory device. This project has the potential to demonstrate the ability to construct high performance computing platforms that exhibit performance and efficiency characteristics well beyond the current commodity-based high performance computing instruments.

PARTNER ORGANIZATIONS: Lawrence Berkeley National Laboratory.

ACKNOWLEDGMENT: We are grateful to the Department of Defense for the sponsorship of this project.

Compute on Data Path: Combating Data Movement in High Performance Computing

High performance computing enabled simulation has been widely considered a third pillar of science along with theory and experimentation, and is a strategic tool in many aspects of scientific discovery and innovation. High performance computing simulations, however, have become highly data intensive in recent years due to data acquisition and generation becoming much cheaper, newer high-resolution multi-model scientific discovery producing and requiring more data, and the insight that useful data can be mined out of large amounts of data being substantially increased.

This project combats the increasingly critical data movement challenge in high performance computing. This project studies the feasibility of a new Compute on Data Path methodology that expects to improve the performance and energy efficiency for high performance computing. This new methodology models both computations and data as objects with a data model that encapsulates and binds them. It fuses data motion and computation leveraging programming model and compiler. It develops an object-based store and runtime to enable computations along data path pipeline. In recent years, a proliferation of advanced high performance computing architectures including multi- and many-core systems, co-processors and accelerators, and heterogeneous computing platforms have been observed. The software solution that addresses the critical data movement challenge, however, has significantly lagged behind. This project has the potential of advancing the understandings and the software solution and further unleashing the power of high performance computing enabled simulation.

This project is funded by the National Science Foundation under grant CCF-1409946.

Project website: http://discl.cs.ttu.edu/cdp

ACKNOWLEDGMENT: We are grateful to the National Science Foundation for the sponsorship of this project.

Development of a Data-Intensive Scalable Computing Instrument (DISCI)

This project, developing DISCI, an all-around computing instrument that compensates the limitations of existing computing-centric HPC instruments toward data-intensive applications, supports five large research projects in HPC system design, computational chemistry, biotechnology, and atmospheric science. Based on research introducing the application-aware and decoupled-execution paradigm concept, the project addresses the big gap between research prototypes and the engineering solution. Impact on future applications, algorithms, and instruments design is expected since the instrument could open up new research areas in supporting data-intensive sciences and possibly reshape HPC instruments adopted in National Computing Facilities and some institutions. In addition to the conventional HPC compute nodes, DISCI has a set of specially designed data nodes. The data nodes offer in-situ data processing to reduce data movement and data-access delay and dynamic provisioning as 'fat' compute nodes when necessary, while the functionality of compute nodes remains the same as in conventional instrumentation. These data nodes work with compute nodes in concert and together they provide an optimum system performance for data-intensive HPC. Based on a hardware-software co-development principle, the instrument consists of two components: the DISCI system architecture and the DISCI runtime software. The system architecture builds an HPC instrument with a data-centric view. The runtime software extends the MPI (Message Passing Interface) and MPI-IO library to support data nodes and their associated in-situ processing. The instrument will enable and foster research activities in the areas of chemical dynamics simulation, simulations of turbulent flows, atmospheric data assimilation and weather forecasting, computational biology, and computer systems that PIs and senior personnel conduct.

Project website: http://discl.cs.ttu.edu/mri-disci

This project is funded by the National Science Foundation under grant CNS-1338078.

ACKNOWLEDGMENT: We are grateful to the National Science Foundation for the sponsorship of this project.

GC64: Purpose-Built Chip Multitasking System and Software Architecture for Data Intensive Computing

High performance and technical computing architectures have traditionally focused on optimizing dense algorithmic structures leaving a significant gap for those seeking to execute sparse, data intensive or memory intensive algorithms and applications. The Goblin-Core64 (GC64) project seeks to develop an open source system architecture explicitly designed to efficiently host applications that historically perform poorly on classic architectures. This project encompasses a chip multitasking processor architecture, high bandwidth memory architecture (based upon Hybrid Memory Cube stacked devices), intelligent global addressing and low-latency task context switching in order to provide a base platform for highly efficient runtime systems and numerical techniques. The base hardware infrastructure and ISA are currently based upon the RISC-V ISA as developed at UCBerkeley. In addition to hardware logic, this project also provides an all-encompassing software tool chain. This includes a native Linux kernel port, LLVM-based optimizing compiler infrastructure, simulation infrastructure and associated runtime libraries and tools. The hardware logic and software tools are provided under a BSD-style license.

Project Website: http://gc64.org

Project Code Repository: http://discl.cs.ttu.edu/gitlab/groups/gc64

  • X. Wang, J. Leidel, Y. Chen. Concurrent Dynamic Memory Coalescing on GoblinCore-64 Architecture. International Symposium on Memory Systems. October 2016.
  • J. Leidel, Y. Chen. Exploring Tag-Bit Memory Operations in Hybrid Memory Cubes. International Symposium on Memory Systems (MemSys). October 2016.
  • J. Leidel, Y. Chen. HMC-Sim 2.0: A Simulation Platform for Exploring Custom Memory Cube Operations. Sixth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES). May 2016.
  • J. Leidel, Y. Chen. HMC-SIM: A Simulation Framework for Hybrid Memory Cube Devices. Accepted to appear in journal of Parallel Processing Letters, December 2014
  • J. Leidel and Y. Chen. HMC-Sim: A Simulation Framework for Hybrid Memory Cube Devices. In The Proceedings of The 2014 Workshop on Large-Scale Parallel Processing (LSPP), in conjunction with the 28th IEEE International Parallel & Distributed Processing Symposium (IPDPS'14).PDF

Unistore: A Unified Storage Architecture for Cloud Computing

Emerging large-scale applications on Cloud computing platform, such as information retrieval, data mining, online business, and social network, are data- rather than computation-intensive. Storage system is one of the most critical components for Cloud computing. The traditional hard disk drives (HDD) are current dominant storage devices in Clouds, but are notorious for long access latency and failure prone. The emerging storage class memory (SCM) such as Solid State Drives provides a new promising storage solution of high bandwidth, low latency, and mechanical component free, but with inherent limitations of small capacity, short lifetime, and high cost. The objective of this project is to build an innovative unified storage architecture (Unistore) with the co-existence and efficient integration of heterogeneous HDD and SCM devices for Cloud storage systems.

This project is funded by Nimboxx and the Cloud and Autonomic Computing site at Texas Tech University.

Project website: http://discl.cs.ttu.edu/unistore

ACKNOWLEDGMENT: We are grateful to the Department of Energy/Argonne National Laboratory for the sponsorship of this project.

* W. Xie and Y. Chen. Elastic Consistent Hashing for Distributed Storage Systems. Accepted to appear in the Proc. of The 31st IEEE International Parallel and Distributed Processing Symposium (IPDPS'17), 2017. (acceptance rate: 23%).

* J. Zhou, W. Xie, Q. Gu and Y. Chen. Hierarchical Consistent Hashing for Heterogeneous Object-based Storage. In the Proceedings of 14th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA'16), 2016. PDF BibTex Txt

* J. Zhou, W. Xie, J. Noble, K. Echo and Y. Chen. SUORA: A Scalable and Uniform Data Distribution Algorithm for Heterogeneous Storage Systems. In the Proceedings of the 11th IEEE International Conference on Networking, Architecture, and Storage (NAS'16), 2016. PDF BibTex Txt

* W. Xie, J. Zhou, M. Reyes, J. Noble, and Y. Chen. Two-Mode Data Distribution Scheme for Heterogeneous Storage in Data Centers (short paper). In the Proceedings of The 2015 IEEE International Conference on Big Data, (BigData'15), 2015. (acceptance rate: 17% full papers and 18% short papers accepted out of 363 complete submissions).PDF BibTexTxt Slides

Decoupled Execution Paradigm for Data-Intensive High-End Computing

I/O on High-End Computing (HEC) machines is increasingly becoming the killing performance bottleneck. However, conventional execution paradigms for HEC are computing-centric and have inherent limitations in addressing critical I/O issues of data-intensive applications. There is a great need for developing unconventional execution paradigms to meet the growing I/O demand of high-end computing.

In this project, collaborating with the Illinois Institute of Technology and University of Illinois at Urbana-Champaign, we propose a decoupled execution paradigm (DEP) to address I/O bottleneck issues. DEP is the first paradigm enabling users to identify and handle data-intensive operations separately. The objective of this project is three-fold: 1) understanding the execution paradigm requirement from the data-centric point of view, 2) studying the feasibility of the proposed decoupled execution paradigm, and 3) providing a partially implemented prototyping of DEP and its associated system design to support the first two objectives. This project is funded by the National Science Foundation Computer Systems Research (CSR) program under grant CNS-1162488.

Project website: http://discl.cs.ttu.edu/dep

ACKNOWLEDGMENT: We are grateful to the Nimboxx and the Cloud and Autonomic Computing site at Texas Tech University for the valuable support for this project.

Active Object Storage for Big Data Applications in High Performance Computing

High Performance Computing (HPC) remains a critical strategic tool for scientific discoveries and innovations. New advances in microprocessor design, such as multicore/manycore and GPGPU (General-Purpose Graphics Processing Unit) processors, provide ever increasing computational speed for HPC, whereas the input/output (I/O) speed lags far behind in performance improvement. Many HPC applications in critical areas of science and technology, such as astrophysics, climate sciences, computational chemistry, computational biology, and high-energy physics, however, have become highly data intensive than ever before. These applications contain a large number of I/O accesses, where large amounts of data are written to and retrieved from storage. Existing HPC systems, nevertheless, are designed and optimized for compute-intensive applications. With the increasing importance of data-intensive or big data applications, there is an imperative need of rethinking HPC system support for data-intensive scientific discoveries and innovations.

This project is funded by the Department of Energy/Argonne National Laboratory.

Project website: http://discl.cs.ttu.edu/aos

ACKNOWLEDGMENT: We are grateful to the Department of Energy/Argonne National Laboratory for the sponsorship of this project.

Decoupled Execution Paradigm for Data-Intensive High-End Computing

I/O on High-End Computing (HEC) machines is increasingly becoming the killing performance bottleneck. However, conventional execution paradigms for HEC are computing-centric and have inherent limitations in addressing critical I/O issues of data-intensive applications. There is a great need for developing unconventional execution paradigms to meet the growing I/O demand of high-end computing.

In this project, collaborating with the Illinois Institute of Technology and University of Illinois at Urbana-Champaign, we propose a decoupled execution paradigm (DEP) to address I/O bottleneck issues. DEP is the first paradigm enabling users to identify and handle data-intensive operations separately. The objective of this project is three-fold: 1) understanding the execution paradigm requirement from the data-centric point of view, 2) studying the feasibility of the proposed decoupled execution paradigm, and 3) providing a partially implemented prototyping of DEP and its associated system design to support the first two objectives. This project is funded by the National Science Foundation Computer Systems Research (CSR) program under grant CNS-1162488.

Project website: http://discl.cs.ttu.edu/dep

ACKNOWLEDGMENT: We are grateful to the National Science Foundation for the sponsorship of this project.

Scalable I/O Architectures for Data-Intensive Computing

High-performance computing (HPC) has entered the post-petascale era and is reaching the exaflop range quickly. Many scientific computing applications and engineering simulations in critical areas of research, such as nanotechnology, astrophysics, climate modeling and weather forecasting, medicine discovery, petroleum engineering, and high-energy physics, are becoming more and more data intensive. I/O has become a crucial performance bottleneck of high-performance computing, especially for data intensive applications. There is a critical and widening gap between the applications' I/O demand and the HPC I/O system capability, which can lead to severe overall performance degradation. New mechanisms and new I/O architectures need to be developed to solve the ‘I/O-wall’ problem. In this research, we investigate new solutions to build a next-generation I/O architecture that scales well and meets the applications’ growing I/O demand. We explore new storage devices including flash-based solid state drives (SSD), general storage-class memory (SCM), and hybrid storage systems to build the hardware component of the new I/O architecture. We explore new designs in parallel file systems, parallel I/O middleware, and parallel programming models to build the software component of the new I/O architecture. The objective of this research is to provide a new I/O architecture that can address the I/O bottleneck issue fundamentally for data-intensive high-performance computing.

  • Y. Chen. Toward Scalable I/O Architecture for Exascale Systems. Accepted to appear in the 4th Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS), Co-located with ACM/IEEE Supercomputing Conference (SC'11), 2011.
  • A. Tambi and Y. Chen. A Comprehensive Benchmark Suite for Emerging Solid State Drives. (Poster Presentation). In the 23rd ACM/IEEE Supercomputing Conference (SC’11), 2011.
  • Y. Guvvala, Y. Chen, and Y. Zhuang. Rethinking RAID for SSD based HPC Systems. (Poster Presentation). In the 23rd ACM/IEEE Supercomputing Conference (SC’11), 2011.
  • Y. Chen, X.-H. Sun, R. Thakur, P. C. Roth and W. Gropp. LACIO: A New Layout-Aware Collective I/O Strategy for Parallel I/O Systems. In the Proc. of IEEE International Parallel and Distributed Processing Symposium (IPDPS'11), 2011. (acceptance rate: 112/571=19.6%). PDF BibTxt Txt
  • H. Song, X.-H. Sun, and Y. Chen. A Hybrid Shared-nothing/Shared-data Storage Scheme for Large-scale Data Processing. In the Proc. of the 9th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA'11), 2011. (Best Paper Award). PDF BibTxt Txt
  • Y. Chen and P. C. Roth. Collective Prefetching for Parallel I/O Systems. In Proc. of the 5th Petascale Data Storage Workshop (PDSW'10), in conjunction with ACM/IEEE Supercomputing (SC'10), 2010. PDF BibTxt Txt
  • H. Jin, X.-H. Sun, Y. Chen and T. Ke. REMEM: REmote MEMory as Checkpointing Storage. In Proc. of the IEEE International Conference on Cloud Computing Technology and Science (Cloudcom'10). 2010. (acceptance rate: <25%). PDF BibTxt Txt
  • Y. Chen, X.-H. Sun, R. Thakur, H. Song and H. Jin. Improving Parallel I/O Performance with Data Layout Awareness. In the Proc. of the IEEE International Conference on Cluster Computing 2010 (Cluster'10), 2010. (acceptance rate: 33/107=30.8%). PDF BibTxt Txt
  • X.-H. Sun, Y. Chen and Y. Yin. Data Layout Optimization for Petascale File Systems. In Proc. of The 4th Petascale Data Storage Workshop (PDSW'09), in conjunction with ACM/IEEE SC'09, 2009. PDF BibTxt Txt

ACKNOWLEDGMENT: We are grateful to the Oak Ridge Associated Universities (ORAU) for the sponsorship of this project in part in “Coordinated I/O Architecture for Exascale High-Performance Computing Systems”.

Multicore Architectures and Data-Access Optimizations

The advent of multicore processors has completely changed the landscape of computing. It brings parallel processing into a single processor at task level. On one hand, it significantly further enlarges the performance gap between data processing and data access. On the other hand, it calls for a rethinking of system design to utilize the potential of multicore architecture. We believe the key of utilizing multicore processors is to reduce the data-access delay. In this research, we focus on reducing data-access delay for multicore architectures. Our approach is three fold: special hardware for swift data access, core-aware and context-aware scheduling and prefetching, and integrated cache management. We have introduced the data access history cache architecture to support dynamic hardware prefetching and smart data management, developed the core-aware memory access scheduling, and integrated the cache management with prefetching and scheduling. Many issues remain open, however, and we continue exploring data-access optimization techniques for multicore architectures in this project.

  • Y. Chen, H. Zhu, P. C. Roth, H. Jin and X.-H. Sun. Global-aware and Multi-order Context-based Prefetching for High-Performance Processors. Accepted to appear in the International Journal of High Performance Computing Applications (IJHPCA), 2011.
  • K. Zhang, Z. Wang, Y. Chen, H. Zhu and X.-H. Sun. PAC-PLRU: A Cache Replacement Policy to Salvage Discarded Predictions from Hardware Prefetchers. In the Proc. of the 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid'11), 2011. (acceptance rate: 55/189=29.1%). PDF BibTxt Txt
  • X.-H. Sun and Y. Chen. Reevaluating Amdahl's Law in the Multicore Era. Journal of Parallel and Distributed Computing (JPDC), Vol. 70, No. 2, 183 – 188, 2010. JPDC online version PDF BibTxt Txt
  • Y. Chen, H. Zhu, H. Jin and X.-H. Sun. Improving the Effectiveness of Context-based Prefetching with Multi-order Analysis. In the Proceedings of the 3rd International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2), 2010. PDF BibTxt Txt
  • Y. Chen, H. Zhu and X.-H. Sun. An Adaptive Data Prefetcher for High-Performance Processors. In the Proc. of the 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid'10), 2010. (acceptance rate: 51/219=23.3%). PDF BibTxt Txt
  • H. Zhu, Y. Chen and X.-H. Sun. Timing Local Streams: Improving Timeliness in Data Prefetching. In the Proc. of the 24th ACM International Conference on Supercomputing (ICS'10), 2010. (acceptance rate: 32/180=17.8%). PDF BibTxt Txt
  • S. Byna, Y. Chen, and X.-H. Sun. Taxonomy of Data Prefetching for Multicore Processors. Journal of Computer Science and Technology (JCST), Vol. 24, No. 3, pp. 405-417, May 2009. PDF BibTxt Txt
  • Z. Fang, X.-H. Sun, Y. Chen and S. Byna. Core-Aware Memory Access Scheduling Schemes. In Proc. of IEEE International Parallel and Distributed Processing Symposium (IPDPS'09), 2009. (acceptance rate: 100/440=22.7%). PDF BibTxt Txt
  • X.-H. Sun, Y. Chen and S. Byna. Scalable Computing in Multicore Era. In Proc. of International Symposium on Parallel Algorithms, Architectures and Programming (PAAP'08), 2008. PDF BibTxt Txt
  • Y. Chen, S. Byna, and X.-H. Sun. Data Access History Cache and Associated Data Prefetching Mechanisms. In Proc. of the ACM/IEEE Supercomputing Conference (SC'07), Reno, Nevada, USA, Nov. 2007. (acceptance rate: 54/268=20.1%) PDF BibTxt Txt

Research Center

Cloud and Autonomic Computing Center (CAC@TTU)

The Cloud and Autonomic Computing Center (CAC@TTU) concentrates on topics in advanced distributed computing as part of the National Science Foundation Industry/University Cooperative Research Centers program.

Please visit the CAC@TTU website for more details.

This project is funded by the National Science Foundation via the NSF I/UCRC program under grant IIP-1362134 and IIP-1238338.

ACKNOWLEDGMENT: We are grateful to the National Science Foundation for the sponsorship of this project.

Education and Outreach Projects

REU-site (Research Experiences for Undergraduates) Project

This research project is led by Dr. Susan Urban and supported by the National Science Foundation through a grant to the Department of Industrial Engineering and the Department of Computer Science at Texas Tech University in Lubbock, Texas. The site is co-funded by the Department of Defense in partnership with the NSF REU program. The research project consists of a 10−week program where students will work collaboratively on cybersecurity, robotics, and software engineering research issues. The selected participants will work closely with faculty members on their current research projects, make short progress presentations to their peers during program meetings, attend presentation skills workshops, make a formal poster presentation of their research experience at the end of the program, submit a written & final report describing the results of their research, and work with faculty and graduate students to publish research results. In addition to research activities, students are provided professional development opportunities that address issues ranging from literature search tools, ethics and professionalism, presentation skills, technical writing skills, and the graduate school application process, with numerous opportunities for social activities. The outcome of the research experience is expected to instill in students the methods and desires to continue with graduate research through mentoring that guides the students towards becoming independent researchers in a welcoming and enriching environment.

Please visit the project website for more info: http://www.depts.ttu.edu/cs/research/reu/. Please note the current 2014 APPLICATION DEADLINE: FEBRUARY 28, 2014, and we cordially welcome your application.

This project is funded by the National Science Foundation and the Department of Defense in partnership with the NSF REU program under grant CNS-1263183.

ACKNOWLEDGMENT: We are grateful to the National Science Foundation and the Department of Defense for the sponsorship of this project.

Student Cluster Competition Project at Texas Tech University

A group of computer science, computer engineering, mathematics, and chemistry undergraduate students form a team under the supervision of Dr. Yong Chen of the Computer Science Department and Dr. James Abbott of the High Performance Computing Center and have been selected as one of the finalist teams and are invited to compete in the Student Cluster Competition (SCC) at the 2012 ACM/IEEE International High Performance Computing, Networking, Storage, and Analysis Conference, a.k.a. the Supercomputing Conference (SC12). This project received the generous sponsorship from Dell Inc. More information can be found from: http://discl.cs.ttu.edu/scc/.

Several articles and videos covering the competition and team e.g. Video 1, Video 2, Article 1, Article 2, Article 3, ..

The team received an Exemplary Spirit Special Recognition.

ACKNOWLEDGMENT: We are grateful to the Dell Inc., NVidia, NSF/TCPP PDC Curriculum Committee, the High Performance Computing Center at Texas Tech University, and NSF (REU) for the generous sponsorship of this project.

Early Adoption of NSF/TCPP PDC Curriculum at Texas Tech University

The Parallel and Distributed Computing (PDC) Curriculum developed by the National Science Foundation (NSF) and the IEEE Computer Society Technical Committee on Parallel Processing (TCPP) provides informative and insightful guidance in strengthening parallel and distributed computing education in computer science and computer engineering. In this project, Drs. Yong Chen, Yu Zhuang, and Noé López-Benitez have initiated an early adoption effort of integrating the PDC Curriculum into the computer science undergraduate program at the Texas Tech University and have been sponsored with the NSF-TCPP Early Adopter Status Award. A poster presenting our efforts can be found below:

Y. Chen, Y.Zhuang, and N. Lopez-Benitez. Fall-11: Early Adoption of NSF/TCPP PDC Curriculum at Texas Tech University and Beyond. In the 3rd NSF/TCPP Workshop on Parallel and Distributed Computing Education (EduPar-13), in conjunction with the 27th IEEE International Parallel & Distributed Processing Symposium (IPDPS'13), Boston, May 20th, 2013. Poster in PDFPaper in PDF BibTex Txt

Y. Chen, Y. Zhuang, and N. Lopez-Benitez. Early Adoption of NSF/TCPP PDC Curriculum at Texas Tech University. Presented in the 2nd NSF/TCPP Workshop on Parallel and Distributed Computing Education (EduPar-12), in conjunction with the 26th IEEE International Parallel & Distributed Processing Symposium (IPDPS'12), Shanghai, May 21, 2012.Poster in PDF Paper in PDF BibTex Txt

Courses enhanced/introduced for PDC education through this effort:

  • CS3375 Computer Architecture Syllabus
  • CS4379 Parallel and Concurrent Programming Syllabus
  • CS2350 Computer Organization and Assembly Language Programming Syllabus
  • CS4331/MATH4000 High Performance Computing Syllabus (supported in part by the NSF Decoupled Execution Paradigm for Data-Intensive High-End Computing Project as well)

ACKNOWLEDGMENT: We are grateful to the National Science Foundation and IEEE Computer Society Technical Committee on Parallel Processing for the sponsorship of this project. We are grateful to the support from the NSF-TCPP Early Adopter Status Award made to the Texas Tech University.

Past Projects