July / August 2014
Special Issue on Data Center Networks
This issue of Cloud-Link is about data center networks, which provide the basic infrastructure for supporting cloud computing. Innovations in data center technologies aim to provide higher resilience, better performance, greener computing and/or storage, scalable resource provisioning, and adaptable high performance networks. One special journal issue and ten related articles have been selected to provide a general overview of this important topic.
The IEEE Journal on Selected Areas in Communications special issue on “Switching and Routing for Scalable and Energy-Efficient Networking” (January 2014) includes ten articles related to data center networks. Furthermore, the article "Trends and Challenges in Cloud Datacenters" provides an overview of cloud data centers in general. To cope with huge data traffic, optical networking technologies are expected to play an important role in supporting data center networks. Three articles: "Enhancing Performance of Cloud Computing Data Center Networks by Hybrid Switching Architecture", "OSA: An Optical Switching Architecture for Data Center Networks with Unprecedented Flexibility" and "Scalable Data Center Network Architecture with Distributed Placement of Optical Switches and Racks" address different architectural issues related to optical networking for data center networks. Another important issue is related to resources and workload management. The article "An Online Integrated Resource Allocator for Guaranteed Performance in Data Centers" studies the problem of allocating server resources and network bandwidth and the article "Proactive Workload Management in Hybrid Cloud Computing" investigates workload management for hybrid (private plus public) data centers. With the growing concern for global warming, there is also a need to study energy efficiency issues. Three articles "Energy and Network Aware Workload Management for Sustainable Data Centers with Thermal Storage", "Energy Efficiency of an Integrated Intra-data-center and Core Network with Edge Caching" and "Harnessing Renewable Energy in Cloud Datacenters: Opportunities and Challenges" study energy-related issues for data centers. Last but not least, data center networks should support multicast communications and the article "Reliable Multicast in Data Center Networks" studies this important issue.
We hope that the aforementioned special journal issue and selected articles can provide you with useful references to further explore this important and interesting topic . Articles have been selected for this issue based on various considerations (e.g., variety, relevancy, anticipated readers' interests) and unavoidably there are many other useful and insightful articles that have not been included. You are also encouraged to search through IEEE Xplore and other databases for further readings.
The theme of the next issue (September/October 2014) of Cloud-Link will be “Cloud Education”. If you would like to recommend any articles, please email email@example.com. Furthermore, we are looking for topics for the upcoming issues. If you have any suggestions, please also let us know.
Henry Chan, Victor Leung, Jens Jensen and Tomasz Wiktor Wlodarczyk
By Dan Li ; Mingwei Xu ; Ying Liu ; Xia Xie
Published in IEEE Transactions on Computers, August 2014
Multicast benefits data center group communication in both saving network traffic and improving application throughput. Reliable packet delivery is required in data center multicast for data-intensive computations. However, existing reliable multicast solutions for the Internet are not suitable for the data center environment, especially with regard to keeping multicast throughput from degrading upon packet loss, which is norm instead of exception in data centers. We present RDCM, a novel reliable multicast protocol for data center network. The key idea of RDCM is to minimize the impact of packet loss on the multicast throughput, by leveraging the rich link resource in data centers. A multicast-tree-aware backup overlay is explicitly built on group members for peer-to-peer packet repair. The backup overlay is organized in such a way that it causes little individual repair burden, control overhead, as well as overall repair traffic. RDCM also realizes a window-based congestion control to adapt its sending rate to the traffic status in the network. Simulation results in typical data center networks show that RDCM can achieve higher application throughput and less traffic footprint than other representative reliable multicast protocols. We have implemented RDCM as a user-level library on Windows platform. The experiments on our test bed show that RDCM handles packet loss without obvious throughput degradation during high-speed data transmission, gracefully respond to link failure and receiver failure, and causes less than 10% CPU overhead to data center servers.
By Yuanxiong Guo ; Yanmin Gong ; Yuguang Fang ; Khargonekar, P.P.
Published in IEEE Transactions on Parallel and Distributed Systems, August 2014
Reducing the carbon footprint of data centers is becoming a primary goal of large IT companies. Unlike traditional energy sources, renewable energy sources are usually intermittent and unpredictable. How to better utilize the green energy from these renewable sources in data centers is a challenging problem. In this paper, we exploit the opportunities offered by geographical load balancing, opportunistic scheduling of delay-tolerant workloads, and thermal storage management in data centers to facilitate green energy integration and reduce the cost of brown energy usage. Moreover, bandwidth cost variations between users and data centers are considered. Specifically, this problem is first formulated as a stochastic program, and then, an online control algorithm based on the Lyapunov optimization technique, called Stochastic Cost Minimization Algorithm (SCMA), is proposed to solve it. The algorithm can enable an explicit trade-off between cost saving and workload delay. Numerical results based on real-world traces illustrate the effectiveness of SCMA in practice.
By Divakaran, D.M. ; Tho Ngoc Le ; Gurusamy, M.
Published in IEEE Transactions on Parallel and Distributed Systems, June 2014
As bandwidth is shared in a best-effort way in today's data centers, traffic generated between a set of VMs (virtual machines) affect the traffic between another set of VMs (possibly belonging to another tenant) sharing the same physical links, leading to unpredictable performance of applications running on these VMs. This article addresses the problem of allocation of not only server resources (computational and storage) but also network bandwidth, to provide performance guarantees in multi-tenant data centers. Bandwidth being a critical shared-resource, we formulate the problem as an optimization problem that minimizes bandwidth demand between clusters of VMs of a tenant; and we prove it as NP-hard. We develop fast online heuristics as an integrated resource allocator (IRA) that decides on the admission of dynamically arriving requests, and allocates resources for the accepted ones. We also present a modified version of IRA, called B-IRA that bounds the cost of bandwidth allocation, while exploring smaller search space for solution. We demonstrate that, IRA accommodates significantly higher number of requests in comparison to a load-balancing resource allocator (LBRA) that does not consider reducing bandwidth between clusters of VMs. IRA also outperforms B-IRA when traffic demands of VMs in an input are not localized.
By Bilal, K. ; Malik, S.U.R. ; Khan, S.U. ; Zomaya, A.Y.
Published in IEEE Cloud Computing, May 2014
Next-generation datacenters (DCs) built on virtualization technologies are pivotal to the effective implementation of the cloud computing paradigm. To deliver the necessary services and quality of service, cloud DCs face major reliability and robustness challenges.
By Xiaoshan Yu ; Huaxi Gu ; Kun Wang ; Gang Wu
Published in Journal of Lightwave Technology, May 2014
Cloud computing services have driven a new design of data center networks. Hybrid switching architecture is one of the promising solutions since it makes better tradeoff between the network performance and technical feasibility. However, as the existing hybrid networks only deploy one-hop optical circuit switching (OCS) in the top layer, the flexibility and scalability is limited. To address this problem, a distributed OCS model is proposed. To reduce the high blocking ratio, the WDM and SDM technologies are introduced to increase the connectivity of the optical network. Moreover a multi-wavelength optical switch based on microring resonators is designed to enable the fast switching. Based on this model, the multi-rooted tree based hybrid architecture with deep integration of optical connection is constructed. A new way to solve the mixed traffic scheduling problem is also provided by delivering the small flows and large flows through the different networks. The simulation results indicate that the multi-rooted tree based hybrid architecture achieves better performance under various traffic patterns. It also introduces less control overhead compared with the existing traffic scheduling schemes.
By Fiorani, M. ; Enzo Ferrari; Aleksic, S. ; Monti, P. ; Chen, J.
Published in IEEE/OSA Journal of Optical Communications and Networking, April 2014
The expected growth of traffic demand may lead to a dramatic increase in the network energy consumption, which needs to be handled in order to guarantee scalability and sustainability of the infrastructure. There are many efforts to improve energy efficiency in communication networks, ranging from the component technology to the architectural and service-level approaches. Because data centers and content delivery networks are responsible for the majority of the energy consumption in the information and communication technology sector, in this paper we address network energy efficiency at the architectural and service levels and propose a unified network architecture that provides both intra-data-center and inter-data-center connectivity together with interconnection toward legacy IP networks. The architecture is well suited for the carrier cloud model, where both data-center and telecom infrastructure are owned and operated by the same entity. It is based on the hybrid optical switching (HOS) concept for achieving high network performance and energy efficiency. Therefore, we refer to it as an integrated HOS network. The main advantage of the integration of core and intra-data-center networks comes from the possibility to avoid the energy-inefficient electronic interfaces between data centers and telecom networks. Our results have verified that the integrated HOS network introduces a higher number of benefits in terms of energy efficiency and network delays compared to the conventional nonintegrated solution. At the service level, recent studies demonstrated that the use of distributed video cache servers can be beneficial in reducing energy consumption of intra-data-center and core networks. However, these studies only take into consideration conventional network solutions based on IP electronic switching, which are characterized by relatively high energy consumption. When a more energy-efficient switching technology, such as HOS, is employed, the advantage of using distributed video cache servers becomes less obvious. In this paper we evaluate the impact of video servers employed at the edge nodes of the integrated HOS network to understand whether edge caching could have any benefit for carrier cloud operators utilizing a HOS network architecture. We have demonstrated that if the distributed video cache servers are not properly dimensioned they may have a negative impact on the benefit obtained by the integrated HOS network.
By Kai Chen ; Singla, A. ; Singh, A. ; Ramachandran, K.
Published in IEEE/ACM Transactions on Networking, April 2014
A detailed examination of evolving traffic characteristics, operator requirements, and network technology trends suggests a move away from nonblocking interconnects in data center networks (DCNs). As a result, recent efforts have advocated oversubscribed networks with the capability to adapt to traffic requirements on-demand. In this paper, we present the design, implementation, and evaluation of OSA, a novel Optical Switching Architecture for DCNs. Leveraging runtime reconfigurable optical devices, OSA dynamically changes its topology and link capacities, thereby achieving unprecedented flexibility to adapt to dynamic traffic patterns. Extensive analytical simulations using both real and synthetic traffic patterns demonstrate that OSA can deliver high bisection bandwidth (60%-100% of the nonblocking architecture). Implementation and evaluation of a small-scale functional prototype further demonstrate the feasibility of OSA.
By Jie Xiao ; Bin Wu ; Xiaohong Jiang ; Pattavina, A.
Published in IEEE/OSA Journal of Optical Communications and Networking, March 2014
Cloud services are fundamentally supported by data center networks (DCNs). With the fast growth of cloud services, the scale of DCNs is increasing rapidly, leading to great concern about system scalability due to multiple constraints. This paper proposes a scalable DCN architecture based on optical switching and transmission, with the distributed placement of optical switches and server racks at different nodes in a given optical network. This solves the scalability issue by relaxing power and cooling constraints and by reducing the number of (electronic) switches using high-capacity optical switches, as well as by simplifying DCN internal connections using wavelengths in the optical network. Moreover, the distributed optical switches provide service access interfaces to meet demand within areas, and thus reduce the transmission cost of the external traffic. The major concern is the additional delay and cost for remote transmissions of the DCN internal traffic. To this end, we study the component placement problem in DCNs under a given set of external demands and internal traffic patterns. By leveraging among multiple conflicting factors such as scalability and internal overhead of the DCN as well as the transmission cost of external traffic, we propose both an integer linear program and a heuristic to minimize the system cost of a DCN while satisfying all service demands in the network. This addresses both scalability and cost minimization issues from a network point of view.
By Hui Zhang ; Guofei Jiang ; Yoshihira, K. ; Haifeng Chen
Published in IEEE Transactions on Network and Service Management, March 2014
The hindrances to the adoption of public cloud computing services include service reliability, data security and privacy, regulation compliant requirements, and so on. To address those concerns, we propose a hybrid cloud computing model which users may adopt as a viable and cost-saving methodology to make the best use of public cloud services along with their privately-owned (legacy) data centers. As the core of this hybrid cloud computing model, an intelligent workload factoring service is designed for proactive workload management. It enables federation between on- and off-premise infrastructures for hosting Internet-based applications, and the intelligence lies in the explicit segregation of base workload and flash crowd workload, the two naturally different components composing the application workload. The core technology of the intelligent workload factoring service is a fast frequent data item detection algorithm, which enables factoring incoming requests not only on volume but also on data content, upon a changing application data popularity. Through analysis and extensive evaluation with real-trace driven simulations and experiments on a hybrid testbed consisting of local computing platform and Amazon Cloud service platform, we showed that the proactive workload management technology can enable reliable workload prediction in the base workload zone (with simple statistical methods), achieve resource efficiency (e.g., 78% higher server capacity than that in base workload zone) and reduce data cache/replication overhead (up to two orders of magnitude) in the flash crowd workload zone, and react fast (with an X^2 speed-up factor) to the changing application data popularity upon the arrival of load spikes.
By Wei Deng; Fangming Liu ; Hai Jin ; Bo Li
Published in IEEE Network, January/February 2014
The proliferation of cloud computing has promoted the wide deployment of large scale datacenters with tremendous power consumption and high carbon emission. To reduce power cost and carbon footprint, an increasing number of cloud service providers have considered green datacenters with renewable energy sources, such as solar or wind. However, unlike the stable supply of grid energy, it is challenging to utilize and realize renewable energy due to the uncertain, intermittent and variable nature. In this article, we provide a taxonomy of the state-of-the-art research in applying renewable energy in cloud computing datacenters from five key aspects, including generation models and prediction methods of renewable energy, capacity planning of green datacenters, intra-datacenter workload scheduling and load balancing across geographically distributed datacenters. By exploring new research challenges involved in managing the use of renewable energy in datacenters, this article attempts to address why, when, where and how to leverage renewable energy in datacenters, also with a focus on future research avenues.
Published in IEEE Journal on Selected Areas in Communications, January 2014
GreenDCN: A General Framework for Achieving Energy Efficiency in Data Center Networks
Low-Emissions Routing Algorithms For Cloud Computing in IP-over-WDM Networks with Data Centers
Sharing Bandwidth by Allocating Switch Buffer in Data Center Networks
LTTP: An LT-Code Based Transport Protocol for Many-to-One Communication in Data Centers
Datacenter Applications in Virtualized Networks: A Cross-Layer Performance Study
On-Line Multicast Scheduling with Bounded Congestion in Fat-Tree Data Center Networks
Scalable Packet Classification for Datacenter Networks
Compressing Forwarding Tables for Datacenter Scalability