Nexan Insights
Posts
The Great AWS Data Center Gamble

The Great AWS Data Center Gamble

How to Juggle GPUs, Hyperscalers, and the AI Tsunami

Ajit Banerjee
December 19, 2024

1. CapEx Allocation and the Race for AI Superiority

Imagine you have a bunch of money—like, billions of dollars. Now, imagine you're Amazon Web Services (AWS), and you’re figuring out how to use that money. Your options? Build gigantic data centers filled with servers, buy a ton of fancy GPUs from NVIDIA, or invest in building your own custom AI chips. You’re in a high-stakes game where every other player is Microsoft Azure, Google Cloud, or even that one uncle who spends big at the casino because he "feels lucky."

AWS has taken a measured approach in its CapEx allocation strategy—choosing to spend carefully rather than go on a GPU shopping spree. For hyperscalers like AWS, CapEx isn’t just about the cost of GPUs or servers; it’s about everything from land and building shells to HVAC systems and power lines. And, unlike your uncle, AWS isn’t just taking a chance. They’re making calculated bets based on what workloads they expect—AI/ML? General compute? A mix? You bet.

AWS's capital expenditure (CapEx) strategy is a delicate balancing act aimed at maximizing infrastructure flexibility while managing costs effectively. In 2023, AWS's CapEx focused on building scalable data centers, utilizing a mix of H100 GPUs and custom Graviton chips to handle AI/ML workloads alongside general compute. This approach is integral to their strategy of not oversaturating any one workload type, enabling greater adaptability in meeting diverse customer needs. For investors, this measured CapEx allocation approach underscores AWS's commitment to long-term sustainability and the minimization of overcommitment risk in volatile hardware markets like GPUs.

Distribution of Capital Expenditures by AWS across Key Infrastructure Categories

2. Why It’s Not All About the H100 GPUs

You’d think with all the buzz around AI that AWS would have just gone all-in on H100 GPUs from NVIDIA, right? But AWS thinks differently. Instead of building entire data centers around just one type of workload—such as AI/ML—they diversify. The philosophy? "Don’t put all your chips (pun intended) on one GPU."

Imagine building a garage, and every space is only sized for a Ferrari. Sure, it’s impressive, but what if you need to park an SUV or a Vespa? AWS opts for a "balanced portfolio" of workloads. They build data centers that can handle a variety of compute needs—general-purpose, AI workloads, GPU-intensive, etc. It’s about efficiency, flexibility, and not overspending on fancy tech that could end up gathering dust.

AWS's reluctance to rely solely on NVIDIA’s H100 GPUs highlights a preference for workload diversity over GPU specialization. AWS’s infrastructure combines a variety of compute resources tailored for flexibility rather than a single workload type. This portfolio strategy minimizes dependency on high-cost GPUs, appealing to investors by reducing risk and enhancing AWS's positioning across diverse cloud applications. Furthermore, AWS’s custom Graviton chips represent a significant technological investment, reducing external supply reliance and reinforcing their competitive edge.

Percentage-Based Analysis of Workload Types in AWS Data Centers

3. Building the Data Center—Legos, but for Grown-Ups

Building a data center sounds easy, right? Buy some land, throw up a building, slap some servers in there, and you’re good to go. Except, for AWS, it's like building a life-sized LEGO set, except the LEGOs are worth billions, and if you mess up, the whole world feels it.

AWS data centers involve intricate planning:

Land and Shell: This is the LEGO baseplate. It’s 15-20% of the cost and is where everything starts.
Electrical Systems & Cooling: Imagine running all those LEGOs on a power supply and cooling system, so they don’t catch on fire—this takes up around 40-45% of the cost.
Servers and Racks: The actual LEGOs. This is where the magic happens—a significant part of the budget goes here.

AWS also plans for density. They try to squeeze as many servers into a rack while still making sure those servers don’t overheat. This is where AWS’s own chips come in—custom hardware like Graviton processors, making it easier for AWS to meet a diverse set of customer needs, without solely depending on NVIDIA.

AWS’s approach to data center construction balances high-density server racks with efficient power and cooling systems. Each data center’s build-out, from land acquisition to specialized cooling setups, reflects a meticulous 20-30% CapEx allocation towards sustainable infrastructure. This distribution reduces both energy costs and physical space requirements, essential for investor confidence in AWS’s operational efficiency. For example, AWS’s custom chips, such as Graviton, are designed with high density and power efficiency in mind, ensuring these data centers can meet diverse customer demands with minimal energy waste.

Table 1: Cost Distribution for Data Center Construction Components

Cost Distribution for Data Center Construction Components

4. Microsoft Azure vs. AWS - A Game of Risk

Azure has been much more aggressive than AWS, spending big on H100 GPUs and expanded capabilities, almost like they’re on a mission to catch up. AWS, on the other hand, is focused on diversification. The goal? To be prepared for whatever future workloads may come while ensuring they aren't overcommitting on speculative trends.

This isn’t just a difference in strategy; it’s a difference in mindset. Azure likes to take moonshots—big bets with big risks. AWS? They play it a little closer to the chest. They’re ready to make the big bets too, but they also want to make sure it’s the right bet.

Azure’s aggressive GPU investment contrasts with AWS’s diversified approach, highlighting Azure’s “moonshot” strategy versus AWS’s pragmatism. AWS’s gradual approach to GPU deployment and custom chip development resonates with investors seeking stable, risk-managed growth. Unlike Azure, which heavily invests in high-cost H100 clusters, AWS’s CapEx strategy is more conservative, ensuring broader financial and technological flexibility without overreliance on speculative GPU demand spikes.

Comparison of Cost Efficiency Between AWS Graviton and H100 GPU Strategies

5. Amazon’s Own Chips - The Graviton of the Situation

Why does AWS bother building its own chips like Graviton and Inferentia when they could just buy more NVIDIA GPUs? Well, it comes down to flexibility and control. If you have your own chips, you can optimize them for your workloads, give customers more options, and reduce dependency on other vendors. AWS doesn’t want to be at the mercy of the GPU market or limited by third-party chip constraints.

Plus, it’s about control over cost and efficiency. AWS is in the business of offering choices, and by having their own chips, they get to offer more choices at better price points for different types of users—because let’s be real, not everyone needs the Ferrari-level performance of an H100.

AWS’s decision to develop custom chips like Graviton and Inferentia reduces dependency on third-party suppliers and positions AWS as a unique player in cloud computing hardware. This autonomy provides cost control and scalability advantages, especially valuable during supply chain constraints. By offering cost-effective, optimized options for various compute needs, AWS appeals to a wider range of customers, from budget-conscious users to high-demand enterprises, making it an attractive choice for long-term investors.

The Graviton Effect: AWS’s Custom Chip Strategy

AWS isn’t just spending for the sake of spending. They’re solving a massive, incredibly complex equation—one involving power, workload diversity, cost efficiency, and future-proofing against uncertainty. They could have bought all the GPUs out there and made a bunch of AI-specific data centers, but instead, they diversified.

AWS’s approach is a long game—using Graviton chips, custom designs, and a strategic investment in workloads. They’re making sure they’re ready for whatever the future holds without overextending themselves in a way that could hurt in the long run.

So, next time you think about the cloud, remember: there’s a lot more than meets the eye when it comes to how these gigantic providers make decisions, and AWS is playing to be the steady winner of this very complicated race.

AWS’s CapEx investments reflect a balanced approach to data center scalability, workload diversity, and technological autonomy. By combining their custom chips with flexible infrastructure, AWS aligns its CapEx strategy with sustainable, growth-oriented goals. This diversified approach not only mitigates risks associated with overspecialized hardware investments but also positions AWS as a resilient, forward-thinking player in the cloud market—a crucial factor for investors focused on stable, long-term returns.

The AWS CapEx Equation: Balancing Growth and Efficiency