AI infrastructure review: CoreWeave GPU cloud
Is CoreWeave’s High Performance GPU Cloud Infrastructure for AI training and Inference a cheaper alternative to the big 3?
Introduction
CoreWeave is a cloud computing provider focused on providing high-performance GPU infrastructure for AI model training and inference. The company provides access to a portfolio of NVIDIA GPUs and has positioned itself as a competitive alternative to traditional cloud providers like AWS, Azure, and Google Cloud. This blog examines CoreWeave's infrastructure, notable features, and the claims made regarding its performance.
CoreWeave's GPU Cloud Infrastructure
CoreWeave offers access to a variety of high-end NVIDIA GPUs, including GB200 NVL72/HGX B200, HGX H100/H200, HGX A100, L40, L40S, A40.
H200 Tensor Core GPUs – Claimed to deliver 1.9x higher performance than the H100, with 1,128GB of total GPU memory per server (each H200 GPU has 140GB of HBM memory, so adding 8 GPU) and 3.2 Tbps (3200 Gbps) NVIDIA Quantum-2 InfiniBand networking for low-latency GPU-to-GPU connectivity and training workload distribution. These GPUs are complemented by NVIDIA BlueField-3 DPUs, which offload networking and storage tasks.
H100 Tensor Core GPUs – Stated to be 4x higher performance than the A100, widely used for large-scale AI training, including GPT-3 workloads (e.g., a reported deployment of 3,500 H100s).
L40S GPUs – Reported to be twice the performance of A40 GPUs, with 733 TFLOPS of compute power, also supported by BlueField-3 DPUs.
In addition to offering virtual servers on bare-metal infrastructure, CoreWeave emphasizes its software stack, which provides low-level observability metrics and workload fungibility—allowing users to rapidly switch and share resources across AI workloads.
AI Infrastructure Features
CoreWeave highlights several AI infrastructure advancements, including:
Memory pooling by using GPU HBM memories within the servers
Liquid Cooling Integration – Starting in 2025, CoreWeave plans to incorporate liquid cooling across its data centers, including clusters utilizing NVIDIA GP200 NVL72.
High-Density GPU Clusters – Supporting up to 130KW per stack, allowing for ultra-high server density in racks of GPU clusters.
Advanced Monitoring and Observability – Managed Grafana dashboards provide real-time visibility into infrastructure performance, including InfiniBand bandwidth, GPU temperature, power consumption, and real-time alerts.
Performance Optimization via Tensorizers – Designed to enable asynchronous checkpointing, reducing the impact of checkpointing on Model FLOPS Utilization (MFU).
Performance Claims: Model FLOPS Utilization (MFU) and Goodput
CoreWeave makes several claims regarding performance, particularly around Model FLOPS Utilization (MFU) and workload efficiency:
MFU Exceeding 50% on NVIDIA Hopper GPUs – The company asserts that its infrastructure achieves 20% higher performance than public foundation model training benchmarks, which typically range from 35% to 45%. Note Model FLOPS Utilization (MFU) is a key efficiency metric that measures how effectively a system utilizes the available FLOPS (floating-point operations per second) of a GPU when training AI models. A higher MFU means that more of the GPU’s computational power is being used productively, leading to faster and more cost-effective model training. Higher MFU = Faster training times → More Cost savings and More efficient use of GPUs → Lower energy waste
96% Goodput Rate – CoreWeave attributes this to proactive node replacement, cluster validation, and deep observability, ensuring workloads consistently run on healthy infrastructure.
Competitive Advantage Over AWS, GCP, and Azure – By maintaining higher MFU, CoreWeave claims that its infrastructure enables faster training times and reduced energy waste.
High-Performance Kubernetes on Bare Metal – Running Kubernetes directly on bare-metal servers via CoreWeave Kubernetes Service (CKS) is positioned as a differentiator, maximizing efficiency and performance.
High-Speed AI Object Storage – CoreWeave’s AI storage solution reportedly delivers 2GB/s per GPU across hundreds of thousands of GPUs, minimizing the impact of data transfers on MFU.
Slurm on Kubernetes (SUNK)
CoreWeave offers Slurm on Kubernetes (SUNK), which enables customers to run Slurm-based workloads across more than 32K GPUs. This is optimized through topology-aware scheduling, designed to fully exploit InfiniBand fabric for efficient node-to-node communication and increased MFU. Note: Slurm is a job scheduler commonly used in HPC. Slurm is optimized to place workloads on GPUs and compute nodes based on their network topology (e.g., GPUs with low-latency connections are grouped together).
Conclusion
CoreWeave presents a high-performance alternative for AI training and inference workloads, with a focus on GPU efficiency, low-latency networking, and workload optimization. The company's claims around higher MFU, goodput rates, and infrastructure efficiency suggest a strong focus on maximizing compute performance while minimizing resource waste.
This summary was prepared based on the review of Coreweave public website information, without detailed review of technical papers referred to by CoreWeave. All performance claims, should be based on the review of the studies cited.