TR-19-01: A Cloud-Agnostic Framework for Geo-Distributed Data-Intensive Applications

As the demand for Cloud computing trends up, valuable datasets are stored in the Cloud across various geographical regions and Cloud platforms and providers. The distribution of data across cloud providers imposes three major challenges for data-driven analysis and applications: the heterogeneity of cloud resources across clouds; low network throughput over the wide-area network; and high monetary cost resulting from moving data in/out cloud regions. In this work we propose a cloud-agnostic framework named PIVOT that builds on open-source technologies and abstraction principles to create the illusion of one single computer for applications and users. We have deployed a prototype across AWS and GCP and investigated its effectiveness against synthetic workloads. Using a combination of advanced middleware techniques and data-locality and cost aware scheduling strategies we show that PIVOT is able to achieve up to 4x improvement in network throughput and reduce > 60% monetary cost.