Abstract

One of the biggest problems that many companies face nowadays is dealing with the huge volumes of data that they generate daily. In the data-driven world all data needs to be stored, organized and analyzed to get the required information that will help the administration to make the right decision to support the next step of the company. Big Data and Business Intelligence have become very popular terms in the business field, where Big Data highlights the tools that are used to manage the huge volume of data. One of the Big Data tools is the Data Warehouse, which is used to manipulate the massive amount of data, while the Business Intelligence (BI) focuses on how we can analyze information from the huge volumes of data that support companies in decision making

In this thesis, we will compare the implementation of the DW concepts using the Relational Database Management Systems (RDBMS), specifically, SQL Server DB over the Hadoop system, and then analyze the resource (CPU and RAM) consumption.

I prove that using the Hadoop system speeds up the process of manipulating these huge volumes of data with very low cost, based on the nature of the Hadoop system that is efficient in processing all kinds of structured, semi-structured, unstructured or raw data with minimum cost and high efficiency in manipulating and storing massive amounts of data.

Library of Congress Subject Headings

Data warehousing; Apache Hadoop; Relational databases--Management; Non-relational databases

Publication Date

11-4-2020

Document Type

Thesis

Student Type

Graduate

Degree Name

Information Sciences and Technologies (MS)

Department, Program, or Center

Information Sciences and Technologies (GCCIS)

Advisor

Edward Holden

Advisor/Committee Member

Qi Yu

Advisor/Committee Member

Michael McQuaid

Campus

RIT – Main Campus

Plan Codes

INFOST-MS

Share

COinS