Achieving Programmability and Performance for Data-Intensive Computations Using Reduction Based APIs

by Dr. Gagan Agrawal


In developing applications for data-intensive computations, there has typically been a tradeoff between programmability and performance, with frameworks like MapReduce preferring programmability over performance. Our work has been demonstrating that high-level APIs can be supported without compromising performance. Specifically, we introduced a variant of MapReduce based on the idea of reduction object. This talk will describe three different systems we have developed using this API (and its extensions). The first system focused on in-situ analytics for scientific simulations. Our system, Smart, is able to provide a high-level API for developing such applications. This system results in much higher performance compared to systems like Spark, and almost comparable performance to use of MPI. The second system is for processing streaming data in a fault-tolerant fashion. Finally, the last system considers an IoT environment where data is processed using a combination of edge devices and a centralized system. We focus on a set of Computer Vision applications for this environment. By generalization of our reduction-based API, we show how a pattern based framework can achieve programmability and performance, and even facilitate optimizations not normally feasible.


Gagan Agrawal is a Professor in School of Computer and Cyber Sciences at Augusta University. Agrawal received his undergraduate degree from Indian Institute of Technology, Kanpur, in 1991 and MS and PhD degrees from University of Maryland, College Park, in 1994 and 1996. He served as assistant professor at University of Delaware from 1996 to 2001, and Associate and Full Professor at Ohio State between 2001 and 2019. Agrawal’s research interests include high performance computing (with emphasis on programming and application frameworks, performance modeling and resilience), parallel machine learning and data mining, cloud and edge computing, and scientific data management. He has published more than 275 papers in these areas over his career and has supervised 30 PhD dissertations. Several of his works have received best paper awards or best paper finalist nominations, including from conferences like SC, HPDC, ICS, CCGRID and HiPC. His work has been extensively funded by National Science Foundation and Department of Energy. He has served on editorial boards of IEEE Transactions on Parallel and Distributed Computing, IEEE Transactions on Cloud Computing, and Journal of Parallel and Distributed Computing. He also served on the ACM Dissertation Award Committee, including chairing it for one year. He has also served as a program committee member, area chair or program co-chair for several conferences in his research areas.

School of Computer and Cyber Sciences Augusta University