Abstract
Effectiveness of MapReduce as a big data processing framework depends on efficiencies of scale for both map and reduce phases. While most map tasks are preemptive and parallelizable, the reduce tasks typically are not easily decomposed and often become a bottleneck due to constraints of data locality and task complexity. By assuming that reduce tasks are non-parallelizable, we study offline scheduling of minimizing makespan and minimizing total completion time, respectively. Both preemptive and non-preemptive reduce tasks are considered. On makespan minimization, for preemptive version we design an algorithm and prove its optimality, for non-preemptive version we design an approximation algorithm with the worst ratio of 3/2-1/2h where h is the number of machines. On total complete time minimization, for non-preemptive version we devise an approximation algorithm with worst case ratio of 2-1/h, and for preemptive version we devise a heuristic. We confirm that our algorithms outperform state-of-art schedulers through experiments.
Original language | English |
---|---|
Title of host publication | Proceedings - IEEE INFOCOM |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 2166-2174 |
Number of pages | 9 |
ISBN (Print) | 9781479933600 |
DOIs | |
Publication status | Published - 2014 Jan 1 |
Event | 33rd IEEE Conference on Computer Communications, IEEE INFOCOM 2014 - Toronto, ON, Canada Duration: 2014 Apr 27 → 2014 May 2 |
Other
Other | 33rd IEEE Conference on Computer Communications, IEEE INFOCOM 2014 |
---|---|
Country/Territory | Canada |
City | Toronto, ON |
Period | 14/4/27 → 14/5/2 |
ASJC Scopus subject areas
- Computer Science(all)
- Electrical and Electronic Engineering