AI Token Costs Are Visible, AI Output Value Is Not—Long Governance Tooling, Short Hyperscalers
On March 4, 2026, a mid-sized enterprise quietly amended its AI procurement contract. The amendment added a clause requiring quarterly ROI attribution for all LLM token spend. The clause will force the vendor—a hyperscaler whose name the enterprise will not disclose—to instrument every prompt, trace every response, and tie token consumption to measurable business outcomes. The hyperscaler cannot do this. Its platform bills for tokens processed, not value delivered. The enterprise is now evaluating third-party observability tooling to do the job the hyperscaler will not. This is not an edge case. It is the leading edge of a structural shift that the market has not yet priced.
Enterprise AI spend jumped from $11.5 billion in 2024 to $37 billion in 2025, a 3.2x increase in a single year. Surveys show 72% of organizations expect LLM spending to rise further in 2025, with some firms reporting AI consuming up to half of IT budgets. Yet 80–85% of enterprises miss AI infrastructure forecasts by over 25% due to weak cost governance, and most cannot tie spend to specific, measurable business outcomes. The problem is structural: AI's costs—tokens, compute, watts—are immediately visible and auditable. The economic value of AI output remains opaque. SemiAnalysis calls this "Dark Output"—the work AI does that national accounts cannot see and enterprises cannot measure. Incoming Fed Chairman Kevin Warsh acknowledged in December 2025 that "if you're looking at the data, my view is you're backward looking. You're going to be late. You're not going to realize the country is able to have non-inflationary growth faster."
The market is pricing hyperscaler AI revenue growth as durable and margin-accretive. Azure's AI services contributed 12–16 percentage points of growth in FY25. AWS revenue grew 19% in Q4 2024 with AI workloads driving re-acceleration. Google Cloud grew 48% by Q4 2025 with Gemini processing 10 billion tokens per minute. The gap: consensus is not pricing the risk that enterprises will rationalize token spend once they realize they cannot measure output value. CFO surveys show cost optimization ranks among top priorities for 2025–2026, and while AI budgets are currently protected, that protection becomes irrational the moment finance leaders realize they are funding a cost center with no attribution. The 2015–2020 cloud FinOps playbook provides the template—enterprises will demand visibility, governance, and cost control, compressing hyperscaler margin and accelerating adoption of observability tooling.
The measurement problem is not new—but AI makes it structural
The 1990s computer revolution created a productivity measurement problem that took a decade to resolve. Robert Solow's quip—"You can see the computer age everywhere, but in the productivity statistics"—captured the gap between visible technology adoption and invisible economic output. A 2013 methodology revision added R&D and intellectual property investment to GDP accounting, retroactively boosting 1990s production by roughly $3.6 trillion, nearly 30% of full-year 2000 GDP. The revision was spread evenly across years, so growth rates barely budged in official accounts, but the magnitude of the correction revealed how badly national statistics lagged the actual economic transformation.
AI is creating the same measurement problem, but faster and with a critical difference. The 1990s productivity gap was invisible to everyone—enterprises, governments, statisticians. AI's gap is asymmetric. Costs are visible in real time. Hyperscalers bill for tokens processed and GPU-hours consumed. These are clean, auditable line items that appear on balance sheets immediately. What enterprises get in return—the output of those tokens—has no standard unit of measurement. A thousand tokens might draft an email, generate a code snippet, summarize a contract, or hallucinate nonsense. The cost is the same. The value is unknowable without context, evaluation, and workflow integration.
This asymmetry is not an accounting quirk. It is a structural feature of how AI workloads are priced and consumed. Hyperscalers have no incentive to make output value visible—doing so would expose the fact that most token spend produces marginal or zero incremental business value. The New Stack reported in early 2026 that GPT-5.4, Claude, and Gemini cannot agree on basic, real-world facts. If frontier models disagree on ground truth, how does an enterprise measure the ROI of a million-token contract summarization job? The answer: it cannot, unless it instruments the workflow, evaluates the output, and ties token consumption to a measurable business outcome. Hyperscalers do not provide this instrumentation. Third-party observability tooling does.
The cloud FinOps playbook is repeating—compressed
The cloud cost optimization playbook from 2015–2020 offers a template. Early cloud adopters like Adobe and Intuit built internal practices to bring financial accountability to cloud spend in the early 2010s, mainly as custom efforts rather than a named discipline. By the mid-2010s, global enterprises were encountering the same core problem: managing cloud costs at scale as workloads moved from pilots to production. The FinOps Foundation was created in 2019 to provide a community, shared vocabulary, and best-practice framework. By 2020, the Foundation had only 12 corporate members and a community of a few thousand practitioners, indicating FinOps was still at the early-adopter stage. After 2020, adoption accelerated dramatically—growth from 12 corporate members in 2020 to over 175 by 2024.
The 2015–2020 period was the formative, high-learning phase. AI cost governance is now entering that same phase, but compressed. The difference: cloud costs were at least tied to visible infrastructure—servers, storage, bandwidth. Enterprises could see what they were paying for, even if they could not control it. AI costs are tied to invisible output. The New Stack reported that AI cost observability has emerged as a watchdog category, with token-based pricing and consumption models becoming one of the fastest-growing expense lines in corporate tech budgets. Recent governance research reports that around 80–85% of enterprises miss their AI infrastructure forecasts by more than 25%, largely because of weak cost governance and underestimation of usage growth. "Shadow AI"—teams expensing AI tools outside central IT/finance—is accelerating both spend and risk, creating duplicate tools, security gaps, and unplanned AI premiums on existing SaaS.
The catalyst is visible. CFO surveys show cost optimization as a top priority for 2025–2026, with more than half putting "enterprise-wide cost optimization" in their top focus areas. Despite this, analysis indicates that while about two-thirds of CFOs were cutting costs in mid-2025, they were generally protecting AI and automation budgets, with some describing AI as "the last thing" to cut. This protection is rational—if AI is strategic. It becomes irrational the moment CFOs realize they are funding a cost center with no ROI attribution. The FinOps playbook shows what happens next: enterprises demand visibility, governance, and cost control. Cloud spend did not stop growing after FinOps emerged, but it stopped growing uncontrolled. Reserved instances, savings plans, rightsizing, and chargeback became standard. Hyperscaler revenue growth moderated. Margins compressed. The same dynamic is now setting up for AI.
Hyperscaler revenue growth is material but margin is at risk
Hyperscaler AI revenue exposure is already material but not yet dominant. AWS revenue was $28.8 billion in Q4 2024, up 19% year-over-year, with AI-driven workloads likely representing high-single-digit to low-teens percentage of revenue but a much higher share of incremental growth. If AI-related services contributed roughly half of incremental revenue growth, that implies roughly $2–4 billion of quarterly AI-related revenue within AWS by late 2024. Azure's Intelligent Cloud revenue was $25.5 billion in Q4 2024, with AI already accounting for a very large share of growth even if not yet the majority of absolute revenue. Google Cloud was $12 billion in Q4 2024, with AI's share smaller than in 2025 but already material. By Q4 2025, Google Cloud was growing 48% with a $70+ billion run-rate, and gen-AI product revenue grew nearly 400% year-over-year.
The problem: token volumes are exploding, and unit costs are collapsing. Google revealed that serving unit costs for Gemini dropped 78% over 2025, implying roughly 4.5x more tokens per GPU-hour. By Q1 2026, Gemini was processing 16 billion tokens per minute. This sets up a race to the bottom in inference costs, with revenue growth fueled by volume times slightly lower per-token price, not price hikes. Margins depend on efficiency gains keeping up with price cuts. If enterprises rationalize token spend—by adopting reserved capacity, governance tooling, or workload migration to cheaper inference layers—hyperscaler revenue growth faces a structural headwind. The market is pricing this growth as durable. It is not.
The structural reason the market has not priced this: AI output measurement standards do not yet exist. Enterprises today use a blended approach—standardized evaluation frameworks for AI output quality (benchmarks, LLM-as-a-judge, human eval) and tie those metrics to business ROI via observability data, baselines, and financial models. There is still no single global standard, but 2024–2025 saw strong moves toward common patterns, tooling, and de-facto metrics across LLM evaluation and MLOps observability. Modern LLM quality frameworks decompose "quality" into dimensions—correctness/groundedness, relevance, completeness, safety, style/UX, latency, robustness—and map them to business impact. The problem: these frameworks are not yet embedded in procurement, budgeting, or financial reporting. Until they are, CFOs will keep funding token spend without ROI attribution. Once they are, hyperscaler revenue growth faces a structural headwind.
Workflow integration is happening—ROI measurement is not
Enterprise workflow integration is real. The New Stack reported that Cursor's new Jira integration is "5 stars, no notes," signaling enterprise workflow adoption. Cursor competes directly with GitHub Copilot, Tabnine, Windsurf, Amazon Q Developer, and Replit for enterprise AI coding, with pricing in roughly the same band but different usage and governance trade-offs. GitHub Copilot reached about 20 million cumulative users and over 1.3 million paid subscribers by mid-2025, with deployment reported at around 90% of Fortune 100 companies. Market-share analyses put Copilot at roughly 42% of the paid AI coding tools market by 2025, with Cursor emerging as the closest competitor at about 18% market share and over $500 million in annualized recurring revenue.
The workflow integration trend is real. The ROI measurement is not. The New Stack reported that the DIY platform trap is burning out engineering teams, with automation complexity growing as teams build custom AI workflows on top of Jira, Linear, GitHub, and internal tooling. These workflows consume tokens at unpredictable rates, and most organizations lack the instrumentation to tie token spend to workflow outcomes. IP and licensing risks are also surfacing. The New Stack reported that developer Gavriel Cohen found his own code inside OpenClaw and walked away. If training data provenance is opaque and output ownership is contested, enterprises face legal and reputational risk on top of uncontrolled spend.
The structural gap: enterprises are integrating AI into workflows faster than they are instrumenting those workflows for cost and quality measurement. This creates a window where token spend grows unchecked and output value remains unmeasured. The window closes when CFOs demand attribution. The companies that provide the instrumentation—observability, governance, identity, and security tooling—capture the incremental budget. The companies that provide the tokens—hyperscalers—face margin compression.
The portfolio—long governance tooling, short hyperscalers (by omission)
This portfolio expresses the thesis through three structural layers: (1) AI observability and cost-attribution tooling that makes token spend and output value visible (Datadog, Elastic); (2) AI governance and workflow integration platforms that tie AI consumption to measurable business outcomes (Palantir, Snowflake); (3) identity and security infrastructure that controls AI access and enforces policy (Okta, Palo Alto Networks, CrowdStrike); and (4) alternative inference and data layers that let enterprises reduce hyperscaler token dependency (Cloudflare, MongoDB). The portfolio is long-only by design. The bearish view on hyperscalers is expressed by not owning them and by owning the tooling layer that captures value when enterprises rationalize token spend.
Datadog (DDOG) · 18% · target $325 · 180 days. Datadog's LLM Observability product instruments prompts, traces token usage, surfaces evaluation metrics, and ties AI spend to business KPIs—the exact workflow the thesis predicts enterprises will demand. The company's platform already monitors cloud infrastructure, applications, and security; LLM observability extends this to AI workloads. Datadog is the cleanest public expression of AI cost governance, but valuation leaves no room for execution stumbles. At 651x P/E and 22.17x P/B, the stock prices in flawless execution and accelerating adoption. Sized at 18% to reflect high conviction tempered by valuation risk.
Palantir (PLTR) · 15% · target $425 · 180 days. Palantir's AIP (Artificial Intelligence Platform) is the thesis in ticker form—it explicitly ties AI token spend to operational KPIs and makes AI Dark Output visible and measurable. AIP's entire 2025 revenue acceleration (56% year-over-year) is driven by enterprises adopting it to rationalize AI spend and prove ROI. The platform integrates with existing enterprise workflows (ERP, CRM, supply chain) and surfaces AI-generated insights tied to specific business outcomes. Valuation at 149x P/E and 40.29x P/B prices in flawless execution, so sized at 15% rather than 20% to limit single-name risk.
Cloudflare (NET) · 15% · target $350 · 270 days. Cloudflare's Workers AI is the cleanest structural exposure to inference workload migration away from hyperscalers. Running inference at the edge cuts latency and token costs simultaneously, shifting the revenue model from per-token billing to flat-rate compute. If enterprises rationalize token spend by migrating workloads to cheaper inference layers, Cloudflare captures that migration. Valuation at 61.32x P/B is venture-like, pricing in rapid adoption and margin expansion. Sized at 15% to reflect high structural sensitivity tempered by execution risk.
CrowdStrike (CRWD) · 12% · target $950 · 180 days. CrowdStrike's Falcon platform and Vanta investment position it at the intersection of AI security, governance, and compliance. As AI workloads proliferate, security and identity layers become the choke point where enterprises enforce policy and measure risk. CrowdStrike's observability, identity, and compliance stack directly supports the shift from uncapped AI spend to governed, ROI-driven AI budgets. Valuation at 42.64x P/B assumes the thesis plays out, so sized at 12% to reflect clean structural exposure with limited upside if adoption merely meets expectations.
Elastic (ESTC) · 10% · target $85 · 270 days. Elastic's observability stack is becoming the substrate for AI cost attribution—enterprises use it to trace token spend, model calls, and latency across LLM workloads. The platform's real-time search and analytics capabilities make it the foundation for AI FinOps dashboards. Valuation at 18.17x P/E and 5.24x P/B is undemanding relative to peers, but competitive pressure from Datadog limits upside. Sized at 10% as a value play on AI observability with capped upside.
Snowflake (SNOW) · 10% · target $310 · 360 days. Snowflake's Cortex AI and data-governance layer position it as infrastructure for AI cost observability—the system of record for token spend and ROI attribution. If token costs become the new labor costs requiring granular tracking, Snowflake's data platform becomes mandatory infrastructure. Valuation at 42.96x P/B flags premium pricing and hyperscaler encroachment risk. Sized at 10% to reflect thesis alignment with execution and competitive risk.
Okta (OKTA) · 8% · target $155 · 360 days. Okta is the public-market proxy for AI governance infrastructure—identity and access management (IAM) becomes the control plane for tracking which users, teams, and apps consume which models at what cost per seat. The AI governance use case remains implicit rather than proven in financials, creating asymmetric upside if the thesis materializes but downside if adoption lags. Sized at 8% to reflect structural fit with product-roadmap uncertainty.
Palo Alto Networks (PANW) · 8% · target $350 · 360 days. Palo Alto Networks' Prisma Cloud and Cortex XSIAM address AI governance and identity tooling, positioning the company as the governance layer between enterprises and hyperscaler AI services. Valuation at 242.56x P/E and 7.39x P/B already prices in this outcome, making this consensus rather than contrarian. Sized at 8% as a quality name priced for perfection.
MongoDB (MDB) · 4% · target $450 · 360 days. MongoDB's Atlas vector search and operational database sit at the layer where enterprises consolidate fragmented data to reduce prompt bloat and token waste. The company has yet to prove vector search drives incremental Atlas consumption at scale, and valuation at 10.08x P/B already prices in AI tailwinds. Sized at 4% as a marginal exposure to the data-layer rationalization thesis.
Instruments
| Ticker | Weight | Target | Horizon |
|---|---|---|---|
| DDOG | 18% | $325 | 180d |
| PLTR | 15% | $425 | 180d |
| NET | 15% | $350 | 270d |
| CRWD | 12% | $950 | 180d |
| ESTC | 10% | $85 | 270d |
| SNOW | 10% | $310 | 360d |
| OKTA | 8% | $155 | 360d |
| PANW | 8% | $350 | 360d |
| MDB | 4% | $450 | 360d |
Assumptions and falsification
-
Enterprises will demand measurable ROI for AI spend within 12–18 months as CFOs realize token costs are growing faster than attributable business outcomes. Falsified if: enterprise AI budgets continue to grow uncapped through 2027 without adoption of cost-governance tooling, or if CFO surveys show AI remains "the last thing to cut" regardless of ROI visibility.
-
AI output measurement standards will emerge and become embedded in enterprise procurement and financial reporting by 2027–2028, following the 2015–2020 cloud FinOps adoption curve. Falsified if: no consensus evaluation frameworks or observability standards gain traction by end of 2026, or if enterprises continue to procure AI services without ROI attribution clauses in contracts.
-
Hyperscaler AI revenue growth will face margin compression as enterprises rationalize token spend through reserved capacity, governance tooling, and workload migration to cheaper inference layers. Falsified if: AWS, Azure, and Google Cloud maintain or expand AI service margins through 2026–2027, or if token pricing remains inelastic to enterprise cost-control efforts.
-
Observability, governance, and identity tooling vendors will capture incremental enterprise budget as AI cost rationalization becomes a board-level mandate. Falsified if: hyperscalers successfully bundle observability and governance into their AI platforms at marginal cost, preventing third-party tooling vendors from gaining pricing power.
-
Workload migration to edge inference (Cloudflare Workers AI) and on-prem alternatives will accelerate as enterprises seek to escape hyperscaler per-token pricing. Falsified if: data gravity and integration costs keep AI workloads centralized on hyperscaler infrastructure, or if hyperscalers cut token prices faster than edge/on-prem alternatives can compete.
Risks
Valuation risk. Datadog, Palantir, Cloudflare, CrowdStrike, Palo Alto Networks, and Snowflake all trade at premium multiples (40x+ P/S or 100x+ P/E) that price in flawless execution. Any earnings miss, product delay, or competitive pressure triggers violent multiple compression. The portfolio is concentrated in names where the thesis is already partially priced in.
Hyperscaler bundling. AWS, Azure, and Google Cloud could bundle observability, governance, and identity tooling into their AI platforms at marginal cost, preventing third-party vendors from capturing incremental budget. Microsoft's integration of Azure Monitor and Sentinel into Azure AI services is the template. If bundling accelerates, the thesis fails.
Adoption timeline. If AI cost rationalization takes longer than the 2015–2020 FinOps analogy suggests (e.g., 5–7 years instead of 3–5 years), the portfolio holds premium-multiple names through a prolonged period of uncertainty, risking de-rating before the thesis materializes.
Regulatory and compliance shocks. If AI output provenance and IP/licensing risks (the Gavriel Cohen / OpenClaw dynamic) escalate into regulatory mandates or class-action litigation, enterprises may pause AI adoption entirely rather than accelerate governance tooling adoption, creating a demand shock.
Crowded-trade risk. AI governance and observability are consensus themes by mid-2026. If institutional positioning is already heavy in Datadog, Palantir, CrowdStrike, and Cloudflare, any macro shock or sector rotation triggers forced selling that overwhelms fundamental support.
Liquidity and borrow. Elastic and MongoDB have lower average volumes than mega-cap peers, creating slippage risk on entry/exit. None of the positions are hard-to-borrow, but Palantir and Cloudflare have elevated short interest, which could amplify volatility.
Sources
- 1.SemiAnalysis — AI Dark Output: The Visible Cost of Invisible Output
- 2.The New Stack — Why GPT-5.4, Claude, and Gemini can’t agree on basic, real-world facts
- 3.The New Stack — I tested Cursor’s new Jira integration and it’s 5 stars, no notes. Here’s why.
- 4.The New Stack — The DIY platform trap that’s burning out engineering teams
- 5.The New Stack — Gavriel Cohen found his own code inside OpenClaw, so he walked away
- 6.The New Stack — The AI cost crisis finally has a watchdog — just not the companies causing it