Abstract

Knowledge graphs are useful for many applications like product recommendations and web search query engines. However, knowledge graphs are marked by incompleteness. Fact-prediction algorithms aim to expand knowledge graphs by predicting missing facts. Fact-prediction algorithms train models using positive facts present and creating negative facts not present in the knowledge graph at hand. Negative facts are obtained by corrupting information in the positive facts present in the knowledge graph at hand. Although it is generally assumed that negative facts drive the accuracy of fact-prediction algorithms, this concept has not been thoroughly examined yet. In this work, we investigate whether negative facts indeed drive fact-prediction accuracy by employing different negative fact generation strategies in translation-based algorithms, a popular branch of fact-prediction algorithms. We propose a new negative fact generation strategy that utilizes knowledge from immediate neighbors to corrupt a fact. Our extensive experiments using well-known benchmarking datasets show that negative facts indeed drive the accuracy of fact-prediction models, and that this accuracy dramatically changes depending on the negative fact generation strategy used for training and testing models. Assuming that the strategies generate negative facts with different levels of semantic plausibility, we observe that models trained using certain strategies are not able to distinguish missing facts from nonsensical or semantically-related facts. Additionally, our results show that the accuracy of models trained using the local-closed world assumption, the most common negative fact generation strategy, can be achieved with a combination of neighborhood-based and nonsensical strategies. This implies that fact-prediction algorithms can be trained using individual subgraphs instead of the whole knowledge graph, opening new research avenues.

Library of Congress Subject Headings

Data mining; Statistics; Control theory

Publication Date

11-2020

Document Type

Thesis

Student Type

Graduate

Degree Name

Computer Science (MS)

Department, Program, or Center

Computer Science (GCCIS)

Advisor

Carlos R. Rivero

Advisor/Committee Member

Zack Butler

Advisor/Committee Member

Ifeoma Nwogu

Campus

RIT – Main Campus

Plan Codes

COMPSCI-MS

Share

COinS