No items found.
No items found.

AI Infrastructure: The Synchronized Capacity Problem

Metal with orange accents
15
Min Read
April 29, 2026
Share Article

A bottleneck of sequential critical path activities with lead times, and why the industry has already figured out that sequential execution does not achieve time to market demands.

Despite the prevailing sentiment presented by the press, the industry is not short of GPUs. It is short of synchronized capacity.

Every layer of the AI infrastructure stack  -  capital, land, permits, grid, transmission, transformers, switchgear, turbines, generators, chips, memory, packaging, cooling, fiber, water, electricians, operators  -  carries a lead time denominated in quarters or years. None of these bottlenecks is fatal on its own. What is fatal is treating them as a sequence. What follows is a stack-ordered inventory with sourced current lead times, then a recalibration of what real-world parallel execution has actually delivered, then the answer.

Layer 1: Capital

Investment exists but it is getting scarcer and the terms on which that capital is deployed is changing. This is a function of the fact that the “obvious” or traditional money has been investing for several years straight. They are either overweight the sector or simply more cautious/strict on terms. Mahdi Yahya wrote about this in his post on the receding tide.

The Big Four hyperscalers plan up to $630 billion in 2026 capex, 62% above 2025's record $388 billion according to Data Center Richness’ Rich Miller. Isabel Juniewicz at Epoch.ai tags it even higher at $770B. That means that capex now exceeds internal cash generation at every hyperscaler except Microsoft. Tomasz Tunguz projects Oracle could hit a liquidity wall by November 2026 on current trajectory and the WSJ weighed in similarly. 

The five issued ~$121 billion in new debt in 2025, with over $90 billion in the final three months alone. BNY Mellon estimates $300 billion in AI/data-center bond issuance in the next year and $1.5 trillion over five years, making AI debt 15–20% of most corporate bond indices - larger than US banks. 

As a result, downstream, financing has moved into structures that transmit risk rather than absorb it. SoftBank is seeking a $10 billion margin loan secured by its OpenAI shares, on top of the $40 billion loan signed last month. And there is the concern around the circular nature of some of this financing. And the largest commitments are circular: OpenAI has allocated $1.15 trillion across seven vendors  -  Broadcom $350B, Oracle $300B, Microsoft $250B, NVIDIA $100B, AMD $90B, AWS $38B, CoreWeave $22B - many of which are simultaneously OpenAI investors.

At 1 GW scale, the cost is the gate. All in data center construction (land, power, chips, cooling, electrical) averages $35-45 million per MW. Account for 10-15% economies of scale at that scale and you are looking at $30-40B per build. This is why you are seeing capital availability and terms tighten considerably. 

Current Lead Time: 4–12 weeks (bond issuance), 8–16 weeks (structured margin loans), 26–52 weeks (syndicated project finance). Increasingly fragile as issuance volume compounds.

Layer 2: Powered Land

Acreage is easy. The relevant unit is not acreage - it is deployable capacity: land with grid headroom, water rights, fiber access, and political permission. That is scarce, and it has repriced.

Parcels near substations and hydroelectric facilities trade at 2x–4x 18-month-ago prices. Loudoun County industrial land now trades above $4 million per acre. Salt Lake City, Des Moines, and Reno are seeing 20–40% YoY increases. Grid-power capacity for existing projects is largely booked through 2030 in most markets. Central Ohio farmland once priced at $30,000 per acre now exceeds $150,000 when rezoned. 

The market has bifurcated. Sites offering power access within 18 to 36 months are highly sought after - that is the fast lane. Everything else is speculative and in the face of mounting political sentiment - fraught.

Current Lead Time: 8–16 weeks to purchase a parcel. 78–156 weeks to convert it to buildable, powered, permitted land.

Layer 3: Permitting and Zoning

Permitting timelines in legacy markets have stretched from 6–12 months to 2–3 years. In 2025, lawmakers across all 50 states introduced 238 data center bills; more than 40 were enacted in 21 states, the majority addressing energy use. Loudoun County eliminated by-right data center development in March 2025  -  all applications now require special exception approval with Planning Commission and Board of Supervisors hearings. Maine just banned datacenters (which the Governor has vetoed). 

Current Lead Time: 26–52 weeks in by-right jurisdictions. 104–156+ weeks in scrutiny markets, with meaningful probability of outright denial.

Layer 4: Grid Interconnection

Grid interconnection is the binding constraint for most announced US capacity. See the Snap datacenter loan as an example. 2.6B just to hold your place in line. 

Pennsylvania, New Jersey, and Maryland’s (PJM) timeline from interconnection application to commercial operation has risen from less than 2 years in 2008 to over 8 years in 2025. Starting summer 2026, PJM will have just enough power to keep the grid reliable. ERCOT added ~23 GW between 2024 and 2025 with another 9 GW slated for early 2026, but generation interconnection requests total ~432 GW. Large-load requests have quadrupled in a year.

FERC has intervened. On December 18, 2025, FERC directed PJM to establish new colocation rules, finding PJM's tariff "unjust and unreasonable." Federal Energy Regulatory Commission

The live example of what happens when a site runs ahead of the grid: OpenAI and Oracle stopped expanding beyond 1.2 GW at Abilene after power supply delays pushed grid availability more than a year out. Six of eight completed buildings now need a new tenant.

Current Lead Time: 208–416 weeks (PJM new interconnection). 104–260 weeks (ERCOT). 52–156 weeks for sites with pre-existing interconnection rights.

Layer 5: Transmission

New interstate transmission lines take an average of more than 4 years and sometimes up to 11 years to receive permits before construction can begin. NEPA impact statements alone take more than 4 years on average, with a quarter of federal reviews taking 6 years or more. DOE's CITAP program targets a binding 2-year federal review for qualifying 230 kV+ projects, but adoption is slow. Construction adds another 104–208 weeks after permits.

Current Lead Time: 208–572 weeks (4–11 years) for new high-voltage transmission. 52–156 weeks for substation upgrades on existing corridors.

Layer 6: Gas Turbines

The smart response to grid delay is "build behind the meter." The market has identified that trade - even if they have not executed it effectively.

Mitsubishi says turbines ordered today won't deliver until 2028–2030. Siemens reports a record backlog of €131 billion. GE Vernova's Q1 2026 backlog grew to 100 GW and is expected to reach 110 GW by year-end; CEO Scott Strazik expects turbine reservations sold out through 2030 by the end of 2026. However, the constraint is not uniform across all equipment types. The tightest conditions apply to heavy-duty, utility-scale gas turbines, which are most relevant for large AI campuses. Smaller turbines and engine-based solutions (such as reciprocating engines) can still be procured on shorter timelines, though often at higher operating cost or lower efficiency.

For large-frame turbines, current delivery timelines reflect both manufacturing backlog and project integration requirements.

Current Lead Time: 156–364 weeks (3–7 years) for new heavy-duty gas turbines. Existing slot reservations trade at premium.

Layer 7: Transformers and Switchgear

Every watt delivered must pass through transformers and switches - but that market is also tight.

Since 2019, demand for generator step-up transformers has grown 274% and power transformers 116% - driven by data centers and grid modernization, not pandemic recovery. Lead times remain at 128 weeks (power transformers) and 144 weeks (GSUs) as of Q2 2025. Wood Mackenzie projects a 30% structural shortfall for power transformers in 2025 and warns demand continues to outpace any realistic near-term increase in manufacturing capacity.

Domestic supply remains limited, leaving the U.S. reliant on imports. Switchgear has improved from peak delays but remains elevated at ~6–9 months for standard systems, longer for high-voltage configurations. High-voltage cable procurement now typically requires 18–24+ months.

Manufacturers including Eaton and Siemens Energy are adding capacity, but most new supply will not come online until ~2027.

The implication is straightforward: these components are no longer routine purchases - they are schedule-defining constraints.

Current Lead Time: ~24–36 months (transformers), ~6–18 months (switchgear), ~18–24+ months (HV cable).

Layer 8: Backup and On-Site Generation

Ironically, the backup is now backed up. Because of the move to on-site, behind the meter power, the backup market is also stretched. Cummins has delivered over 39 GW of power equipment to data centers and roughly doubled production capacity in 2025. Caterpillar increased data center engine manufacturing capacity 125% compared to two years ago.

Aeroderivative turbines  -  jet engines repurposed as 50 MW gas generators  -  have become the fastest permanent power available. GE Vernova's aircraft-derived turbine orders jumped 33% in 2025. ProEnergy has delivered over 1 GW of 50-MW gas turbines built on Boeing 747 engine cores. Boom Supersonic pivoted the company. This is a great sign of how smart ideas are flooding the market:

Current Lead Time: 26–52 weeks (diesel gensets), 52–104 weeks (aeroderivative turbines), 52–78 weeks (UPS/BESS).

Layer 9: Chips (GPUs, ASICs, Substitution)

This is the layer that gets the headlines. It is the wrong layer to worry about.

NVIDIA is sold out and will keep selling out. Blackwell is sold out through mid-2026 with a 3.6 million unit backlog. But here again, the market is surprisingly resilient.

Substitution is real and accelerating. NVIDIA commanded ~87% of AI accelerator revenue in 2024, roughly 80% today, and is projected to fall to ~75% by 2026  -  not because NVIDIA is shrinking (it just hit new highs with a $5T market cap), but because the market is growing faster than any single vendor can keep up with. TrendForce projects custom ASIC sales grow 45% in 2026 vs. 16% for GPUs.

The substitutes are shipping now:

  • Google TPU. Google's total expected TPU shipments: 4.3 million units in 2026, scaling to 35 million by 2028. Broadcom commands >70% of the custom AI accelerator market, targeting $100 billion in AI chip revenue by 2027. Anthropic alone committed to ~3.5 GW of next-gen TPU-based compute starting in 2027. As the folks at Semianalysis reported, Google's v7 Ironwood TPU narrows the performance gap to NVIDIA flagship to a few quarters. 
  • AWS Trainium. Project Rainier went from announcement to ~500 MW operational in 12 months, running ~500,000 Trainium2 chips for Anthropic, targeting 1 million by year-end. It is AWS's largest AI infrastructure project, deployed less than a year after announcement. 
  • Broadcom multi-customer ASIC. Broadcom now builds custom silicon for Meta, OpenAI, and Anthropic alongside Google. OpenAI's commitment covers a 10-gigawatt program of co-developed accelerators.
  • Microsoft Maia, Meta MTIA, AMD MI-series, Qualcomm A1200. All shipping now. Custom ASICs deliver 40–65% lower total cost of ownership versus GPUs for specific workloads at scale. 
  • Huawei Ascend in China. Huawei is mass-shipping Ascend 910C and preparing Ascend 920. DeepSeek tests show Ascend 910C delivers ~60% of H100 inference performance. US sanctions will limit Ascend use outside China, but inside China, Huawei expects to ship 700,000+ Ascend 910-series in 2025.

Inside an allocated customer, chip delivery cadence is in weeks, not years.

Current Lead Time: 26–52 weeks for new customer allocation. 4–12 weeks for incremental shipments against existing allocation. Substitution tier expanding 45% year-over-year.

Layer 10: HBM Memory

The tightest silicon-layer constraint and the one substitution cannot solve - because every GPU, every TPU, every Trainium, and every Huawei Ascend needs HBM.

HBM capacity is sold out through 2026 across all three suppliers (SK Hynix, Micron, Samsung). SK Hynix holds 62% market share;  NVIDIA accounts for ~90% of SK Hynix's HBM supply. SK Hynix CFO: "We have already sold out our entire 2026 HBM supply." HBM consumes approximately three times the wafer capacity of standard DRAM per gigabyte. You know this is a constraint when NVIDIA cuts gaming GPU production 30–40% (H1 2026) due to GDDR7 constraints.
Samsung and SK Hynix raised HBM3E prices ~20% for 2026.

Current Lead Time: 52–104 weeks for new allocation. 2027 is the battleground.

Layer 11: Advanced Packaging (CoWoS)

This is the gate between functional silicon and usable AI accelerators.

Even when wafers are available, chips cannot ship without advanced packaging - particularly CoWoS (Chip-on-Wafer-on-Substrate), which integrates GPUs with high-bandwidth memory.

This layer remains structurally constrained. C. C. Wei has stated that CoWoS capacity is tight and effectively sold out into 2026, and NVIDIA has similarly indicated that packaging capacity remains a binding constraint on its ability to ship AI systems.

The industry is expanding capacity aggressively. TSMC is scaling CoWoS output materially through 2026, but capacity remains tightly allocated.

While they are not the gate today, it can become a hidden governor when wafer capacity exists but can’t be converted into deployable systems without access to CoWoS and related technologies.

Current Lead Time: Effectively allocation-driven; new entrants face ~9–18 month timelines, while existing customers operate within pre-committed capacity programs.

Layer 12: Cooling

Air cooling is operationally impractical at scale at GB200-class density. The GB200 NVL72 generates 120 kW+ per rack — 5x to 15x beyond air cooling's 8–25 kW ceiling. The GPU die operates at 500–600 W/cm² heat flux, comparable to nuclear reactor fuel rods. Liquid cooling is mandatory. The more common NVL36x2 configuration runs ~66 kW per rack (132 kW per pair), with 115 kW liquid cooled and 17 kW air cooled. Vera Rubin NVL144 approaches 600 kW per rack.

CDU lead times currently run 12–18 months for quality units despite it being the fastest-growing segment in data center cooling. Cold plate fabrication equipment itself has lead times exceeding 6 months exacerbated by persistent shortages of high-grade copper. 

Current Lead Time: 52–78 weeks for CDUs. 26+ weeks for cold plates and manifolds. Tightening as GB300 and Vera Rubin ramp. Cooling is a parallel procurement item that must start alongside transformers, not after them.

Layer 13: Fiber

Fiber optic cable lead times have reached up to 60 weeks  -  the longest since the early 2000s build-out cycle. Prices rose up to 70% between 2021 and 2024. A modern AI facility requires 10–36x more fiber than a legacy counterpart. The U.S. must nearly double its fiber route miles by the end of the decade. Each new hyperscale site requires ~135 route miles of connectivity on average. 

Current Lead Time: 26–60 weeks for cable. 52–104 weeks for new long-haul route construction.

Layer 14: Water

Regionally binding, not universally. Texas data centers will use 49 billion gallons in 2025 and up to 399 billion by 2030. The problem is that two-thirds of data centers built since 2022 are in water-stressed regions. Cooling water use may rise 870% in the coming years - Brookings.

Current Lead Time: 26–78 weeks for water rights procurement; indefinite in acute-scarcity regions.

Layer 15: Construction Labor

Microsoft president Brad Smith identified electrical talent shortages as the #1 problem slowing US data center expansion. Microsoft employs electricians commuting up to 75 miles or temporarily relocating. The construction industry faces a shortage of ~439,000 workers as of November 2025, most in skilled trades. Data center contractors face backlogs close to a year. 300,000+ new electricians are needed just to meet AI-driven demand while 200,000 retire over the decade. Electrical work is 45–70% of total data center construction cost per IBEW.

The EU is investing in reskilling workers from oil, gas, and mining into electrification roles — the European Commission estimates retraining costs of €350 million to €1.4 billion per year. Globally, the IEA finds two-thirds of oil and gas workers have the base skills to transition with targeted retraining, but estimates new qualified entrants would need to rise 40% by 2030 just to prevent the gap from widening.

Current Lead Time: 44-week average backlog for skilled electrical crews.

The Coordination Penalty: What the Market Actually Tells Us

I could build a toy model with coordination variance σ, buffer erosion B, rework probability p, and rework delay r. But real projects have already run the experiment. Five case studies answer the question of what sequential vs. parallel execution actually produces.

Case 1  -  Sequential greenfield (hypothetical). If a developer attempted to execute this stack sequentially - finish permits, then order transformers, then order turbines, then lock labor, then order chips - the longest chain would stack roughly as follows: permitting (104 weeks) → grid interconnection new-construction (260 weeks) → gas turbines ordered against that (156 weeks, partially parallel) → transformers (128 weeks, partially parallel) → site construction (104 weeks) → commissioning (12 weeks). Net: 400–600 weeks, or 7.5–11.5 years. No one actually runs this playbook, but it is instructive.

Case 2  -  Net-new greenfield with utility co-development (Meta Hyperion). Project Hyperion broke ground in December 2024 on 2,250 acres in Richland Parish, Louisiana. First phase operational 2028. 2 GW by 2030. 5 GW later. Energy is building 2.26 GW of dedicated gas generation, 240 miles of 500 kV transmission, and battery storage in parallel with construction. Meta signed financing with Blue Owl at 80%/20% ownership, contributing $27 billion total. First phase: ~182 weeks (3.5 years). Full 2 GW: ~260 weeks (5 years). This is the benchmark for a fully-permitted, fully-gridded, fully-permanent net-new campus.

Case 3  -  Parallel execution on pre-entitled site (Stargate Abilene / Crusoe-Lancium). Lancium held the Abilene site from 2020 - four years of utility coordination, permitting, and infrastructure prep before Crusoe arrived. Construction began June 2024. Powered primarily by on-site GE Vernova and Solar Turbines gas turbines filed January 2025; effectively off-grid. 7,000 electricians flown in. First 200 MW: ~65 weeks. Full 1.2 GW: ~139 weeks. But this compresses only because Lancium did 4 years (~208 weeks) of unseen site prep.

Case 4  -  Parallel execution on transmission-ready industrial site (AWS Project Rainier). AWS began scouting the Indiana site in spring 2023, working with American Electric Power's Indiana Michigan Power subsidiary on transmission. Groundbreaking October 2024. Ultimate target: 2.2 GW across 30 buildings. Sited specifically for transmission access; utility co-developed infrastructure. First 500 MW: ~57 weeks. This is parallel execution with pre-existing transmission, custom silicon (Trainium2, bypassing CoWoS allocation fights), and a single-vendor stack from the same parent company.

Case 5  -  Gray-path emergency build (xAI Colossus). Site selected March 2024 - a decommissioned Electrolux factory with 15 MW of existing industrial power. Operational July 2024 with 100,000 H100s -  122 days. Powered by 35+ unpermitted gas turbines (~420 MW) with cooling from a quarter of the US mobile cooling supply. NVIDIA CEO Jensen Huang: "projects of this scale typically take around four years." 250 MW operational: ~17 weeks. But the project now carries environmental permit liability, community opposition, and a temporary-infrastructure tail that has to be replaced. Speed bought by mortgaging legal and operational certainty.

The Observable Pattern

Sequentially executed greenfield: 400–600 weeks. Net-new greenfield with utility co-development: 182–260 weeks to first phase. Parallel execution on pre-entitled site: 57–139 weeks to first phase. Gray-path emergency build: 17–26 weeks, with permit/operational debt.

The system does not move at the speed of its fastest component. It moves at the speed of its slowest coordinated component.

The penalty for sequential execution is not 20% or 50%. It is an order of magnitude. A sequential developer ships capacity in 2032. A parallel developer on a pre-entitled site ships in 2026. Both ordered the same transformers. Local optimization does not produce global efficiency in a tightly coupled system.

The Capital Penalty

The coordination penalty is not just time. On a $35 billion, 1 GW campus financed at an 11% blended cost of capital, a 26-week slip costs roughly $1.9 billion in carrying cost alone — capital deployed against non-productive assets. The revenue penalty is worse: at $12.5 million per MW per year, every week of delay on a 1 GW site is ~$240 million in foregone revenue. A 26-week slip is $6+ billion in revenue that a competitor with a synchronized build is capturing instead. On a project modelled at a 15% IRR over a seven-year horizon, shifting the entire cash flow curve six months right compresses returns by 300–400 basis points — the difference between a project that creates value and one that merely returns its cost of capital. Idle capital, lost revenue, underutilization, contractual risk, IRR compression — these are not five separate problems. They’re one problem: the synchronization penalty, denominated in dollars instead of weeks.

The Integration Answer: Parallel Execution and Site Pre-Entitlement

I want to hammer home three things. 

First, chips are the easiest problem in the stack. New NVIDIA allocation is 26–52 weeks and incremental shipments are 4–12 weeks - an order of magnitude shorter than transformers (128 weeks), turbines (156–364 weeks), transmission (208–572 weeks), permitting (104+ weeks in scrutiny markets), or skilled electrical labor (44-week backlog plus 208 weeks to train a journeyman). And the market is responding faster to the chip constraint than to any other due to improvements in NVIDIA’s supply chain plus substitutes coming online. 

Second, the operational answer is pre-entitled sites plus parallel procurement. The projects actually delivering capacity in 2026 are the ones that pre-committed every long-lead item before construction began:

  • Transformers ordered at land option signing, not permit approval.
  • Turbine slots reserved before permit submission, using deposits and swap rights.
  • Electrician crews retained on retainer before foundations pour; flown in from other states if necessary.
  • Commissioning staff hired 18 months before energization.
  • Liquid cooling architecture frozen at building design, not rack delivery.
  • Financing structured against the longest-lead item, not the shortest.

This is the operational signature shared by Stargate Abilene, Project Rainier, xAI Colossus, and Meta Hyperion. It is not shared by the projects that are slipping.

Third, the defensible moat is pre-entitlement. Pre-entitled land - parcels with executed interconnection agreements, grandfathered permits, existing substation headroom, and political cover - is the only part of the stack that compounds into a durable asset. From Stargate to Rainer the winners are pre-entitlement stories disguised as construction stories.

The firms that will own the 2026–2030 buildout are the ones that quietly accumulated powered land between 2020 and 2024, or that partner directly with those who did.

Capacity can be built in 2026. It is being built in 2026. But not by anyone still assembling the stack from the top down or from the chip out. The industry is not short of GPUs. It is short of synchronized capacity  -  and synchronization is a capability, not a commodity.

FAQs

No items found.

How To's

No items found.

Related Articles