Abstract

This project looked to explore the process of one of the scientific community’s leading text mining software, Pathway Studio, which efficiently streamlines the tedious task of information sorting and gathering in a clinical setting. To do this Pathway Studio implements a sentiment scoring algorithm which decides, based on interpreting literature similarly to natural human understanding, what sections of literature are most relevant to a given search request and provides corresponding peer-reviewed work in the industry. The novel statistics derived of its performance are used to establish an average form of a statement which reflects that of a standard level of human understanding of text. This was done by determining a mean number of words found between a given source and target, and of those words how many occurring are verbs. This was found to be 10 words to 1 verb occurring on average within this project’s dataset. In addition, this project sets the foundation and proposing a new readability scoring formula that adds insight to the structure of a citation and how it may be interpreted relative to a researcher’s average level of understanding of the scientific community’s peer-reviewed literature. The structures observed were scored anywhere from 0 to 10, where 0 represented unreadable citations, 1 represented a citation containing the mean number of words and verbs between its source and target, and 10 being representing a citation that was verb-rich, regardless of the number of total words found between its source and target. According to this readability scoring formula it was found that the variation of the observed citation used for this project from the determined mean citation structure was 70.5%.

Library of Congress Subject Headings

Scientific literature--Evaluation; Natural language processing (Computer science); Readability (Literary style)

Publication Date

7-27-2021

Document Type

Thesis

Student Type

Graduate

Degree Name

Bioinformatics (MS)

Department, Program, or Center

Thomas H. Gosnell School of Life Sciences (COS)

Advisor

Gary R. Skuse

Advisor/Committee Member

Gordon Broderick

Advisor/Committee Member

Matthew Morris

Campus

RIT – Main Campus

Plan Codes

BIOINFO-MS

Share

COinS