Abstract

In the post-exascale era storage systems, a fundamental challenge faced by the research community is the efficient and scalable access to the stored information while meeting the high-performance requirements of big data applications. In this dissertation, we studied the limitations in the existing state-of-the-art architectures and proposed a system to address the challenges of scalability and high performance. Our proposed solution, called MITRA, supports several scientific formats, i.e., Hierarchical Data Format (HDF), network Common Data Form (netCDF), and Comma-Separated Values (CSV), and is composed of several software components that work together to provide high I/O throughput to user applications. The key novelty of MITRA lies in supporting a variety of file formats, generation and indexing of metadata for scientific datasets, and optimizing data lookup time while providing scalability of storage subsystem with the increasing amount of data. MITRA generates and manages indices using a relational database which can be effectively accessed using conventional application programming interfaces (APIs). We evaluated the performance of MITRA and compare it with the traditional approaches for its ingestion speed, content processing, lookup time, and scalability for the generated indices. Our evaluation reveals that the rich metadata indices of MITRA improve system lookup by reducing the search space for the metadata that is not present in indices. Moreover, MITRA outperforms the existing approach in terms of scalability as indices grow in size by balancing the load between available hardware resources.

Library of Congress Subject Headings

Metadata--Management; Research--Abstracting and indexing; Big data

Publication Date

8-2021

Document Type

Thesis

Student Type

Graduate

Degree Name

Computer Science (MS)

Department, Program, or Center

Computer Science (GCCIS)

Advisor

Mustafa Rafique

Advisor/Committee Member

Michael Mior

Advisor/Committee Member

Ifeoma Nwogu

Recommended Citation

Thakkar, Sarthak, "MITRA: Robust Architecture for Distributed Metadata Indexing" (2021). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/10917

Campus

RIT – Main Campus

Plan Codes

COMPSCI-MS

Download

COinS

Theses

MITRA: Robust Architecture for Distributed Metadata Indexing

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

Advisor

Advisor/Committee Member

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Search

Browse

Author Corner

RIT Links

Theses

MITRA: Robust Architecture for Distributed Metadata Indexing

Author

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

Advisor

Advisor/Committee Member

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Share

Search

Browse

Author Corner

RIT Links