Abstract

Embedding knowledge graphs is a common method used to encode information from the graph at hand projected in a low dimensional space. There are two shortcomings in the field of knowledge graph embeddings for link prediction. The first shortcoming is that, as far as we know, current software libraries to compute knowledge graph embeddings differ from the original papers proposing these embeddings. Certain implementations are faithful to the original papers, while others range from minute differences to significant variations. Due to these implementation variations, it is difficult to compare the same algorithm from multiple libraries and also affects our ability to reproduce results. In this report, we describe a new framework, AugmentedKGE (aKGE), to embed knowledge graphs. The library features multiple knowledge graph embedding algorithms, a rank-based evaluator, and is developed completely using Python and PyTorch. The second shortcoming is that, during the evaluation process of link prediction, the goal is to rank based on scores a positive triple over a (typically large) number of negative triples. Accuracy metrics used in the evaluation of link prediction are aggregations of the ranks of the positive triples under evaluation and do not typically provide enough details as to why a number of negative triples are ranked higher than their positive counterparts. Providing explanations to these triples aids in understanding the results of the link predictions based on knowledge graph embeddings. Current approaches mainly focus on explaining embeddings rather than predictions and single predictions rather than all the link predictions made by the embeddings of a certain knowledge graph. In this report, we present an approach to explain all these predictions by providing two metrics that serve to quantify and compare the explainability of different embeddings. From the results of evaluating aKGE, we observe that the accuracy metrics are better than the accuracy metrics obtained from the standard implementation of OpenKE. From the results of explainability, we observe that the horn rules obtained explain more than 50% of all the negative triples generated.

Library of Congress Subject Headings

Graph theory; Data structures (Computer science); Machine learning; Neural networks (Computer science)

Publication Date

7-29-2021

Document Type

Thesis

Student Type

Graduate

Degree Name

Computer Science (MS)

Department, Program, or Center

Computer Science (GCCIS)

Advisor

Carlos R. Rivero

Advisor/Committee Member

Zack Butler

Advisor/Committee Member

Michael Mior

Campus

RIT – Main Campus

Plan Codes

COMPSCI-MS

Share

COinS