Abstract

In this dissertation we present multiple state-of-the-art deep learning methods for computer vision tasks using multi-scale approaches for two main tasks: pose estimation and semantic segmentation. For pose estimation, we introduce a complete framework expanding the fields-of-view of the network through a multi-scale approach, resulting in a significant increasing the effectiveness of conventional backbone architectures, for several pose estimation tasks without requiring a larger network or postprocessing. Our multi-scale pose estimation framework contributes to research on methods for single-person pose estimation in both 2D and 3D scenarios, pose estimation in videos, and the estimation of multiple people’s pose in a single image for both top-down and bottom-up approaches. In addition to the enhanced capability of multi-person pose estimation generated by our multi-scale approach, our framework also demonstrates a superior capacity to expanded the more detailed and heavier task of full-body pose estimation, including up to 133 joints per person. For segmentation, we present a new efficient architecture for semantic segmentation, based on a “Waterfall” Atrous Spatial Pooling architecture, that achieves a considerable accuracy increase while decreasing the number of network parameters and memory footprint. The proposed Waterfall architecture leverages the efficiency of progressive filtering in the cascade architecture while maintaining multi-scale fields-of-view comparable to spatial pyramid configurations. Additionally, our method does not rely on a postprocessing stage with conditional random fields, which further reduces complexity and required training time.

Library of Congress Subject Headings

Computer vision; Deep learning (Machine learning); Gesture recognition (Computer science); Image segmentation

Publication Date

3-2022

Document Type

Dissertation

Student Type

Graduate

Degree Name

Electrical and Computer Engineering (Ph.D)

Department, Program, or Center

Electrical Engineering (KGCOE)

Advisor

Matthew Dye

Advisor/Committee Member

Andres Kwasinski

Advisor/Committee Member

Raymond Ptucha

Recommended Citation

Artacho, Bruno, "Multi-Scale Architectures for Human Pose Estimation" (2022). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/11105

Campus

RIT – Main Campus

Plan Codes

ECE-PHD

Download

COinS

Theses

Multi-Scale Architectures for Human Pose Estimation

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

Advisor

Advisor/Committee Member

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Search

Browse

Author Corner

RIT Links

Theses

Multi-Scale Architectures for Human Pose Estimation

Author

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

Advisor

Advisor/Committee Member

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Share

Search

Browse

Author Corner

RIT Links