Serverless computing is an integral part of the recent success of cloud computing, offering cost and performance efficiency for small and large scale distributed systems. Owing to the increasing interest of developers in integrating distributed computing techniques into deep learning frameworks for better performance, serverless infrastructures have been the choice of many to host their applications. However, this computing architecture bears resource limitations which challenge the successful completion of many deep learning jobs.
In our research, an approach is presented to address timeout and memory resource limitations which are two key issues to deep learning on serverless infrastructures. Focusing on Apache OpenWhisk as severless platform, and TensorFlow as deep learning framework, our solution follows an in-depth assessment of the former and failed attempts at tackling resource constraints through system-level modifications. The proposed approach employs data parallelism and ensures the concurrent execution of separate cloud functions. A weighted averaging of intermediate models is afterwards applied to build an ensemble model ready for evaluation. Through a fine-grained system design, our solution executed and completed deep learning workflows on OpenWhisk with a 0% failure rate. Moreover, the comparison with a traditional deployment on OpenWhisk shows that our approach uses 45% less memory and reduces the execution time by 58%.
Library of Congress Subject Headings
Machine learning; Cloud computing; Electronic data processing--Distributed processing; Parallel processing (Electronic computers)
Computer Science (MS)
Department, Program, or Center
Computer Science (GCCIS)
M. Mustafa Rafique
Assogba, Kevin Tunder Elom, "A Data-parallel Approach for Efficient Resource Utilization in Distributed Serverless Deep Learning" (2020). Thesis. Rochester Institute of Technology. Accessed from
RIT – Main Campus