TR-19-02: PIVOT: Cost-Aware Scheduling of Data-Intensive Applications in a Cloud-Agnostic System

We have witnessed a surge in big data applications being hosted by assorted cloud vendors, and the astronomical amount of data they produce and consume on a daily basis. Traditional cluster computing frameworks can hardly cope with the unprecedented data volume and the geo-distributed, cross-cloud data distribution due to their limited scalability and adaptability across the heterogeneous clouds. Moreover, running data-intensive applications across clouds at will is extremely cost-inefficient and likely to incur outrageous expenses. Hence, we introduce our cloud-agnostic system PIVOT with the novel cost-aware scheduling algorithm, which enables data-intensive applications to run and scale across clouds instantly in a cost-efficient manner. We evaluate our system and scheduling algorithm extensively with simulation, and real-world big data applications on a deployment across 11 regions on AWS and GCP. The experimental results show that PIVOT achieves over 55% saving in expense for VM subscription and up to 92% for egress network traffic compared to the state-of-the-art baselines. Notably, the cost-aware scheduling also achieves up to a 10x speedup in data transfers for data-intensive applications.