- Nexan Insights
- Posts
- The Unseen Machinery
The Unseen Machinery
AWS, GPU Wars, and the Future of AI Infrastructure



The Unseen Machinery: AWS, GPU Wars, and the Future of AI Infrastructure
Let’s talk about an intricate dance of partnerships, GPUs, and AI models. It’s a world where big words like "disaggregated GPU compute" and "hyperscalers" are tossed around, but the real game is about who’s packing the most silicon horsepower—and how effectively they can wield it.
Amazon Web Services (AWS) is playing a pivotal role in this rapidly evolving stage, with major players like NVIDIA, Google Cloud, and Together.AI. What’s at stake? A leading role in the next-gen infrastructure to handle Generative AI. Grab a virtual seat; we’re about to dissect the relationship between AWS, GPU availability, partnerships, and how all of this will shape the AI of tomorrow.e the first infographic:
GPU Market Stack: The Tower of Compute Power

1. GPU - The Hardware Battle Royale
In the heart of the AI game is GPU access. There are two factions: those who have GPUs and those who are scrambling to get them. AWS's Head of Digital Transformation paints a scene where cutting-edge GPUs, like NVIDIA's H-100, are like the golden tickets of Willy Wonka's Chocolate Factory—the shortage is real, and it’s more exclusive than a celebrity yacht party.
AWS and its competitors are currently building high-performance GPU clusters that look like Frankenstein’s monster of compute power. AWS is using high-speed connectivity to interlink GPUs scattered in different places—think of it like connecting pieces of a giant robot with impossibly fast invisible strings.
The AI race has evolved into a battle for GPU access. High-performance GPUs, like NVIDIA's H-100, are essential for handling extensive datasets and training cutting-edge AI models. AWS, a leader in cloud compute, leverages its infrastructure to cluster GPUs across data centers, creating high-speed interlinks that improve efficiency.
AWS’s strategy to interconnect high-performance GPU clusters highlights its ability to meet demand despite scarcity. The “disaggregated GPU compute” model allows AWS to optimize resources, reduce latency, and provide a scalable infrastructure—a critical advantage as the demand for AI resources continues to grow.
"Adoption Rates of A-100 and H-100 GPUs Across Major Cloud Providers"

Adoption Rates of A-100 and H-100 GPUs Across Major Cloud Providers

2. Software - The Orchestra of Complexity
Next is the software layer—imagine that the GPUs are an orchestra, and the software is the conductor keeping them all in harmony. AWS knows this orchestra has room to grow—there’s still work to be done before the music plays smoothly.
Together.AI's APIs are a standout in this orchestra. They’re powerful, open-source friendly, and pretty easy to use. AWS is happy to play a guarded tune—working with models like Meta's LLaMA—and has been improving orchestration via initiatives like Amazon Bedrock, where they roll out pre-trained models like cupcakes in a Model Garden. AWS may not have the perfect conductor yet, but the groundwork has been laid.
The software orchestration layer is essential for AI compute. AWS’s proprietary solutions, like Amazon Bedrock, leverage pre-trained models and integrate smoothly with partner software stacks, such as Together.AI’s APIs. While Together.AI excels in offering powerful APIs and API-driven integrations, AWS focuses on a more guarded approach with secure orchestration.
Software orchestration drives resource optimization and makes infrastructure flexible. AWS’s orchestration improvements position it to serve more complex AI workloads and scale enterprise applications, potentially attracting long-term customer contracts. Together.AI’s API-driven approach, while effective, will need more development for large-scale orchestration to maintain a competitive edge.
Comparison of Model Library Distribution Between AWS and Together.AI Across Categories

Comparison of Model Library Distribution Between AWS and Together.AI Across Categories

3. Strategic Partnerships - Friends or Frenemies?
AWS’s relationship with different players can best be described as, well, complicated. Partners and competitors mingle in an ecosystem that makes Game of Thrones look like a summer picnic. For instance, Hugging Face is a “strategic partner” (meaning they share their secret blueprints confidentially), while Together.AI is more of a marketplace partner—AWS is happy to list them in their store, but the deeper alliances are yet to form.
This mixed-bag strategy is AWS's way of hedging its bets. If you can’t beat them, well, invite them to dinner but don’t let them see the secret family recipe.
AWS balances competition and collaboration, maintaining partnerships with key players like Hugging Face and Together.AI while competing with Google Cloud. Hugging Face serves as a strategic partner, aligning roadmap development with AWS, whereas Together.AI is a marketplace partner. AWS’s Bedrock program also allows AWS to diversify partnerships, securing both collaborative and competitive advantages.
AWS’s ecosystem approach strategically aligns with partners, hedging against technology shifts and securing a diverse portfolio of capabilities. These partnerships enhance AWS’s value proposition for enterprise customers seeking both flexibility and scalability, solidifying its place as a frontrunner in AI infrastructure.
Assessment of AWS's Collaborative Partnerships with Key Industry Leaders

Assessment of AWS's Collaborative Partnerships with Key Industry Leaders

4. Commodity Risk - From Crown Jewels to Common Goods
In today’s AI landscape, GPUs are the crown jewels—the “have-nots” are left gnawing at the leftovers. But, if there’s one constant in technology, it’s commoditization. AWS knows that the scarcity that makes GPUs special today won’t last forever. Google Cloud, with their new H-100 clusters, and others are leveling the playing field by making GPUs more accessible.
This means companies like Together.AI have a ticking clock on their competitive advantage—if they don’t evolve, the next wave of democratized hardware could sweep them into commodity status. AWS is banking on their ability to continually innovate their orchestration systems and build strategic partnerships. And they’re betting that their position as a hyperscaler will give them the reach that niche players simply won’t be able to match.
GPUs are in high demand but face a potential commoditization risk. With new clusters on the horizon from providers like Google Cloud, AI hardware access may democratize, potentially reducing Together.AI’s hardware advantage. AWS bets on sustained orchestration innovation, aiming to keep its infrastructure flexible and valuable even as hardware becomes more widely available.
AWS’s orchestration-first approach offers a safety net against hardware commoditization. By focusing on orchestration and interconnectivity, AWS mitigates risks associated with hardware commoditization, giving it a sustained edge that niche providers may struggle to match.
Forecast of GPU Model Commoditization Trends from 2024 to 2029

Forecast of GPU Model Commoditization Trends from 2024 to 2029

5. Together.AI - The Challenger
Together.AI gets a solid 7 out of 10 in AWS’s eyes. They’re well-positioned, have great hardware, and have done some impressive work on their APIs—but they’re missing the orchestration and customer deployment needed to elevate them to a 9 or 10. The success of models like RedPajama is key to their future, and they’re banking on keeping their hardware edge as the space continues to grow.
The overall sentiment here is positive but realistic. Together.AI could become a serious contender if it can crack the orchestration problem and expand beyond its niche.
Together.AI holds a promising position in hardware, utilizing NVIDIA H-100 GPUs, yet lacks robust orchestration capabilities at the enterprise level. Their powerful APIs are a competitive strength, but Together.AI’s ability to become a serious contender depends on expanding orchestration capabilities, potentially through models like RedPajama, which focuses on rapid, fine-tuned model deployment.
Together.AI’s strength lies in hardware and APIs, but expanding enterprise-level orchestration capabilities is crucial for growth. Investors should monitor their progress with orchestration and customer deployment capabilities, as this would elevate Together.AI from a strong niche player to a market contender
Radar Chart Highlighting Together.AI's Core Strengths and Areas for Improvement

Radar Chart Highlighting Together.AI's Core Strengths and Areas for Improvement

In this intricate AI ecosystem, the real question is who can build the best orchestra of hardware and software, while keeping their partners close and their frenemies closer. AWS is playing a long-term game, evolving as the space shifts from pure compute battles to smarter orchestration and deeper, production-level AI applications.
In the short term, it’s all about the silicon—but the real story will be told by who can adapt fastest when GPUs become commodities, and when enterprises finally shift from experimentation to full-scale production.
GPU Strategy Showdown: AWS vs. Google Cloud

