Current methods in computer vision and object detection rely heavily on neural networks and deep learning. This active area of research is used in applications such as autonomous driving, aerial imaging, defense and surveillance. State-of-the-art object detection methods rely on rectangular shaped, horizontal/vertical bounding boxes drawn over an object to accurately localize its position. Such orthogonal bounding boxes ignore object pose, resulting in reduced object localization, and limiting downstream tasks such as object understanding and tracking. To overcome these limitations, this research presents object detection improvements that aid tighter and more precise detections. In particular, we modify the object detection anchor box definition to firstly include rotations along with height and width and secondly to allow arbitrary four corner point shapes. Further, the introduction of new anchor boxes gives the model additional freedom to model objects which are centered about a 45-degree axis of rotation. The resulting network allows minimum compromises in speed and reliability while providing more accurate localization. We present results on the DOTA dataset, showing the value of the flexible object boundaries, especially with rotated and non-rectangular objects.
Library of Congress Subject Headings
Computer vision; Pattern recognition systems
Computer Engineering (MS)
Department, Program, or Center
Computer Engineering (KGCOE)
Bhat, Aneesh, "Aerial Object Detection using Learnable Bounding Boxes" (2019). Thesis. Rochester Institute of Technology. Accessed from
RIT – Main Campus