Cloud computing is a broad term that encompasses the delivery of services from storage to databases via the Internet. Popular Cloud providers like Microsoft, Amazon, and Google frequently reallocate their resources among customers, providing a wide array of services within their Cloud ecosystem that could power businesses and companies.
To ensure fairness and efficiency, Cloud platforms rely on optimizations to achieve effective allocation. Resources from Cloud providers can range from storage databases to isolated network environments, providing a number of different functions that could help operations with different needs and demands. However, conventional optimization methods have hindered the operators’ ability to adapt to sudden changes in the environment or to customer demands, which can vary significantly among different entities.
Inspired by this problem, fifth-year PhD student Pooria Namyar and his collaborators developed Soroush, a set of optimization-based methods to create a fairer and faster allocation process within the cloud system.
“Large cloud providers make substantial investments in provisioning and operating their cloud infrastructure. Therefore, it is necessary for them to prioritize customer satisfaction by allocating all these resources fairly and efficiently within a reasonable time,” said Namyar.
“Large cloud providers make substantial investments in provisioning and operating their cloud infrastructure.”
Namyar, whose primary research centers on studying networks and optimizing the performance of large-scale cloud systems, became interested in developing a system capable of quickly adapting to customer demand. Despite recent advancements, existing allocators remain too slow to keep pace with changing customer needs. This results in unfairness and potential discrepancies, where some users might not receive the services they want while others receive more than they need.
This fair resource allocation has been a bottleneck problem for cloud operators. In resource allocation, max-min fairness is a principle designed to ensure that no user or request receives more than its fair share of resources without reducing the allocation of another user with an equal or smaller value. However, existing max-min fair resource allocators would need to solve numerous optimizations iteratively before they could come to an allocation. Namyar and his collaborators observed that even a fast approximate iterative max-min fair solver can miss nearly half of its deadlines to finish computations, which leads to a 60% reduction in fairness and a 30% reduction in efficiency.
“Therefore, it is necessary for them to prioritize customer satisfaction by allocating all these resources fairly and efficiently within a reasonable time,” said Namyar.
To mitigate this issue, the researchers created Soroush to solve the max-min fair allocation on a quicker scale. Soroush is unique as it can find a fair allocation by only solving a single optimization. It is substantially faster than existing methods and has yet to fail in meeting deadlines.
“Pooria’s work is rare in managing to be both theoretically elegant while also having a significant impact,” said Ramesh Govindan, a USC computer science professor and Namyar’s advisor. “His observation that you can solve fair allocation problems in a single-shot optimization is a significant advance in the networking literature.”
“The fact that his algorithm is used in one of the largest networks in the world (Microsoft’s OneWAN) makes it a really exciting accomplishment,” Govindan added.
Soroush has been deployed in the production pipeline at Microsoft Azure WAN. It has a 2.4 times average speed up and can even go 5.4 times faster compared to Microsoft’s previous allocator while matching in fairness and efficiency.
“We expect Soroush’s benefit to grow in the future as the network size and number of users increase,” said Namyar.
Originally from Iran, Namyar said Soroush was named after a messenger angel in ancient Persian mythology.
“We named our system Soroush since it brings happiness to users by improving fairness and quality of service,” Namyar added.
Soroush is one of the projects in which Namyar has collaborated closely with leading researchers at Microsoft, Behnaz Arzani and Srikanth Kandula, since his first internship in the summer of 2021. He presented the paper on this work at the USENIX Symposium on Networked Systems Design and Implementation (NSDI) conference in April 2024 at Santa Clara.
Namyar is on track to graduate in the spring of 2025. Afterward, he hopes to continue working as an industry researcher or become a faculty member to pursue his passion for working on impactful research problems.
Published on July 10th, 2024
Last updated on October 3rd, 2024