Abstract

In the post-exascale era storage systems, a fundamental challenge faced by the research community is the efficient and scalable access to the stored information while meeting the high-performance requirements of big data applications. In this dissertation, we studied the limitations in the existing state-of-the-art architectures and proposed a system to address the challenges of scalability and high performance. Our proposed solution, called MITRA, supports several scientific formats, i.e., Hierarchical Data Format (HDF), network Common Data Form (netCDF), and Comma-Separated Values (CSV), and is composed of several software components that work together to provide high I/O throughput to user applications. The key novelty of MITRA lies in supporting a variety of file formats, generation and indexing of metadata for scientific datasets, and optimizing data lookup time while providing scalability of storage subsystem with the increasing amount of data. MITRA generates and manages indices using a relational database which can be effectively accessed using conventional application programming interfaces (APIs). We evaluated the performance of MITRA and compare it with the traditional approaches for its ingestion speed, content processing, lookup time, and scalability for the generated indices. Our evaluation reveals that the rich metadata indices of MITRA improve system lookup by reducing the search space for the metadata that is not present in indices. Moreover, MITRA outperforms the existing approach in terms of scalability as indices grow in size by balancing the load between available hardware resources.

Publication Date

8-2021

Document Type

Thesis

Student Type

Graduate

Degree Name

Computer Science (MS)

Department, Program, or Center

Computer Science (GCCIS)

Advisor

Mustafa Rafique

Advisor/Committee Member

Michael Mior

Advisor/Committee Member

Ifeoma Nwogu

Campus

RIT – Main Campus

Share

COinS