Amazon Web Services distinguished engineer, James Hamilton was the first keynote presenter at this year's re:Invent conference. James' presentation featured a review of AWS data center and infrastructure design principles. James presented a compelling case that AWS cloud infrastructure is enterprise ready.
Data Center Design
Today AWS is deploying more server capacity daily that is sufficient to support Amazon's entire 2005 need. In 2005, Amazon was an $8.49 billion business. The growth of AWS capacity is being supported by expansion of existing regions and the addition of four new regions next year. While the size and number of AWS regions are growing, the size of each data center is remaining relatively conservative. Each AWS data center is architected to support 50-80K servers and consume 25-32MW of electricity. AWS has maintained this data center size to limit the fault zone and maintain the cost efficiency. AWS design the infrastructure and run their data centers with an overhead of only 10-12% which helps keep their cost to serve low. Each region consists of two to five availability zones consisting of one to eight data centers. AWS data center architecture is highly optimized for enterprise IT availability, and cost. James has blogged about optimizing data center design costs on his blog here.
Network design is another area where AWS has taken a unique approach. James stated that early on the scale of AWS "broke the standard vertical network router and switch architectures" and they had to build their web scale network. AWS designs all its networking hardware and writes all its networking software. This allows them to minimize costs and maximize agility since it is optimized for their single purpose. They have found that this approach actually improved their network reliability and enabled them to introduce new capabilities faster. Two interesting topics James discussed were AWS bet on 25GbE instead of the industry standard 40GbE used in most enterprise IT data centers and their use of network ASIC.
The case for 25GbE is based on the cost of optics. AWS networking architecture is built on 100 GbE. 100GbE is four 25 Gbps waves. Most enterprise IT data center designs are leveraging 40 GbE which is four 10Gbps waves. Minimizing the cost of the optics for web scale data center designs has major cost and efficiency benefits. From an engineering perspective it is simple to design and maintain a 25 GbE top of rack switch that aggregates into a single 100 GbE data center switch.
The second interesting network design principle that James shared was AWS use of network ASIC. It is often stated that web scale IT service providers like AWS leverage commodity hardware. While they do rely on built to spec hardware they are also embracing custom silicon to provide a competitive advantage. AWS acquired Annapurna Labs in January of 2015 giving AWS the ability to design and optimize the network silicon in addition to the hardware, and software. The silicon design capability is being used to design specialized application-specific integrated circuit (ASIC) to offload repetitive tasks from the hardware. The offload of repetitive tasks to custom network ASIC's reduces power consumption for the task, while improving performance. Another example how AWS is using its ability to build purpose built infrastructure to provide differentiation.
AWS has long been optimizing their infrastructure based on server design.
James shared AWS's design philosophy which is based on simplicity and optimizing for power and cooling efficiency over density. Power and cooling costs are more expensive than data center space as James has explained in his blog posts here. James shared an older AWS server design for comparison purposes and highlighted the efficiency difference compared to traditional enterprise IT designs and denser commercial server models.
AWS has committed their data center being 100% powered by renewable energy (https://aws.amazon.com/about-aws/sustainability/). James reported that AWS has reached 40% renewable energy support today and expects to reach 50% by the end of 2017. Meeting these goals is complicated by their explosive infrastructure growth. New AWS projects will generate 2.6 million MWhr of energy annually using a combination of solar and wind generation farms. This type of sustainable energy commitment is critical to our environment. Although power consumption by data centers has plateaued in the past few years, data centers still consume 2% of all US electricity according to US department of energy estimates.
I thought it was interesting that AWS chose to kickoff re:Invent with an overview of their data center and infrastructure design. Many people believe infrastructure no longer matters with little differentiation. After James' presentation I think you will agree infrastructure done right can provide differentiation and hardware design is still important to a well-run enterprise IT environment. A recording of James' keynote is available here and his presentation is available here.