№ C2Learn With Darin · Comparison

Bedrock vs Vertex AI vs Foundry: a decision matrix.

You've already decided you want a hyperscaler-managed inference platform instead of going direct to Anthropic, OpenAI, or Google. Now you have to pick which one. This is the head-to-head you reach for when the choice is real, not theoretical.

Updated May 2026 ~16 min read Three platforms, one production decision

Part 01

The decision frame

Eight times out of ten, the choice between Bedrock, Vertex AI, and Foundry is settled before you sit down to evaluate them. The deciding factor is your existing cloud footprint, not the feature checklist. AWS-native shops should run inference in AWS. GCP-native shops should run it in GCP. Microsoft-native shops should run it in Azure. The auth, audit, billing, and procurement benefits of staying in your existing cloud are larger than they look on paper, and they compound across every team that has to operate the thing.

The other 20% of the time, the deciding factor is one of three things, in order:

The model you specifically need. If you must have GPT-5 with enterprise compliance, Foundry is the only managed home. If you must have Gemini in production, Vertex is the only path. If you must have the latest Claude with the broadest region story, Bedrock is the obvious pick.
A binding compliance constraint. US federal workloads push toward GovCloud or Azure Government. Specific EU residency contracts push toward whichever sovereign-cloud surface your contract names.
The team's operational competence. The platform that matches your existing skill profile ships fastest. A team fluent in IAM and CloudWatch will be productive on Bedrock in days. The same team will spend weeks getting comfortable with Entra and Azure Monitor before they ship anything on Foundry.

If none of those three apply and you genuinely have a clean slate, the choice opens up. That's the case this page is built for: when the decision is actually live, here's how to make it efficiently. For the deeper background on each platform, the cloud platforms field guide covers the model lineups, agent frameworks, and pitfalls in detail. This page is the head-to-head; that one is the encyclopedia.

Note Notice that "which model is best" is not the first question. It's the third. Cloud footprint dominates because it's the cost you pay every day forever. Picking a second cloud just to get a marginally better model is rarely worth the operational tax, and the model gaps shrink every quarter.

What "existing footprint" actually means

The phrase gets thrown around without precision. What I mean by it: where does your team already have an organizational AWS / GCP / Azure account with billing, IAM, audit, and a security review pipeline that someone trusts? A pile of personal AWS accounts on a credit card does not count. A single project in GCP that one person owns does not count. The signal is whether your security and finance teams can answer the question "who owns this account, and what's already approved to run in it" without a meeting.

That bar matters because the cloud platform's value proposition is auth, audit, residency, and procurement. If you don't already have those three plumbed into one of the hyperscalers, you're not getting value from running inference there; you're paying for the privilege of standing them up. That's a fine choice when you have a compliance constraint forcing the issue. It's a bad choice when you're picking a platform for a prototype that hasn't shipped yet.

Part 02

At a glance

One paragraph per platform, plus the strongest case for each. If one of these obviously fits your situation, you can probably stop reading and skim the matrix below to confirm.

AWS Bedrock

The widest model catalog of the three (Claude, Llama, Mistral, Cohere, AI21, Nova, Stability, plus a moving DeepSeek-and-others tail). Auth is plain IAM with no separate API keys. The Converse API gives you a unified chat shape across model families, so you can swap the model behind your code without rewriting the call. Bedrock Agents covers the simple chatbot-with-tools shape; Bedrock AgentCore covers the production-grade case (Runtime, Memory, Identity, Gateway, Browser, Code Interpreter, Observability as separable services). The killer feature is having Anthropic's Claude as a first-class managed model.

Strongest case: AWS-native shop that wants Claude in production with IAM and audit, and may want Llama or Nova alongside it without changing platforms.

Google Vertex AI

Native Gemini (the only enterprise-controlled path to Gemini in production), plus Claude in a few regions, plus Llama, Mistral, and the broader Model Garden. Auth is GCP IAM with service accounts. The MLOps tooling (Vertex AI Pipelines, Model Monitoring, Experiments) is the strongest of the three by a noticeable margin, especially if you came from a Kubeflow or MLflow background. Vertex AI Search is the strongest managed-RAG retrieval layer in the bunch (it's Google).

Strongest case: Google-org shop that runs Workspace, wants Gemini with enterprise controls, and has any kind of MLOps-shaped workload alongside the inference path.

Azure AI Foundry

The only managed home for OpenAI's frontier models (GPT-5, the o-series, DALL-E, Whisper) with Azure compliance, residency, and Microsoft Entra auth. Plus the Foundry Catalog of open-weight options (Llama, Mistral, Phi, Cohere, the HuggingFace tail) deployable as managed serverless endpoints or dedicated capacity. Slots into Purview for data classification, Sentinel for security logging, and the Microsoft 365 Copilot plumbing if you want to extend that surface.

Strongest case: Microsoft 365 / Entra shop that needs OpenAI models with contractual data residency and the Azure compliance certifications (FedRAMP High, IL5, HIPAA, the rest).

Part 03

Side-by-side matrix

The dimensions that actually move decisions. There is no overall winner; there is a winner per row, and the right pick is whichever set of rows lines up with your constraints.

Dimension	Bedrock	Vertex AI	Foundry
Best fit cloud footprint	AWS-native	GCP-native	Azure / M365-native
Auth plane	IAM (no separate keys)	GCP IAM, service accounts	Microsoft Entra
Native first-party model	Amazon Nova	Google Gemini	OpenAI GPT-5, o-series
Anthropic Claude	full, broad regions	2 to 3 regions	not hosted
OpenAI GPT-5 / o-series	not hosted	not hosted	full
Google Gemini	not hosted	full	not hosted
Open-weight catalog	broad, growing	Model Garden	Foundry Catalog (largest)
Latest-model freshness vs direct	days for Claude, varies elsewhere	day-zero for Gemini, days for Claude	days to weeks for OpenAI
Region / residency story	per-model matrix, GovCloud	Gemini broad, Assured Workloads	most fragmented, strongest sovereign
Agent framework (simple)	Bedrock Agents	Agent Builder	Agent Service
Agent framework (production)	Bedrock AgentCore	Agent Builder + Vertex Pipelines	Agent Service + Prompt Flow
Managed RAG	Knowledge Bases	Vertex AI Search (strongest)	Foundry Indexes
MLOps tooling	SageMaker (separate product)	Pipelines, Monitoring, Experiments	Azure ML (adjacent)
Sovereign / GovCloud	AWS GovCloud	Assured Workloads	Azure Government, Azure China
Cost shapes	on-demand, Provisioned Throughput	on-demand, Provisioned Throughput	on-demand, PTUs, serverless
Cross-region inference	first-class profiles	via routing config	via deployment regions
Guardrails layer	Bedrock Guardrails	Safety filters	Content Safety

The shorthand I use, repeated from the field guide because it actually holds up: Bedrock for breadth and IAM, Vertex for Gemini and MLOps, Foundry for OpenAI and Microsoft. Each is the obvious choice in its primary lane and a respectable second choice in the others.

What's not in the matrix (and why)

Three things people ask about that I deliberately left out. Raw model quality benchmarks: every platform hosts roughly the same Claude, Llama, and Mistral; benchmark differences come from the model and the prompt, not the platform underneath. List-price token cost: the per-token rates for the same model are within a few percent of each other across platforms, and the total bill is dominated by your usage shape (provisioned vs on-demand, regional spread, RAG vs not) rather than the headline rate. Latency: all three are fine; none beat going direct to the source vendor; the difference between them on the same model is usually within noise.

Things that do matter and didn't fit cleanly in a table: the maturity of the docs (Bedrock and Vertex both ahead of Foundry post-rebrand), the size of the practitioner community (AWS largest, Azure second, GCP third by a clear margin), and how often the product surface gets renamed (Foundry the most volatile, Vertex middle, Bedrock the most stable). Weight those by how much they'll actually affect your team's day-to-day.

Part 04

Strongest case for each

One paragraph per platform on who should actually pick what. Honest, not balanced.

You should pick Bedrock if

Your org runs on AWS and you want Claude in production. That alone is enough. Bedrock is the only managed Claude path on AWS, the IAM integration removes a whole class of credential management, and the Converse API is the cleanest unified chat surface across vendors. Add in Bedrock AgentCore for production agents (the framework-agnostic Runtime, Memory, Identity, and Observability primitives are genuinely better than what the other two ship), the breadth of the model catalog (Llama, Mistral, Nova, Cohere, AI21 all in one place), and the cross-region inference profiles that dodge throttling at zero list-price cost. The pieces that bite are the per-account model-enable dance, the per-model region matrix, and the verbose API. None of those are disqualifying.

You should pick Vertex AI if

Your org runs on GCP, or your org runs Google Workspace and wants its inference path to read Workspace data through the Google security boundary instead of through a third-party connector. Vertex is the only enterprise-controlled path to Gemini, and Gemini is genuinely the right model for several workload shapes (long-context document analysis, anything that needs to ground against Drive or Gmail, and multimodal tasks where Gemini's vision quality leads). The MLOps tooling is the strongest of the three; if you have any kind of training, monitoring, or experiment-tracking workload alongside inference, Vertex slots in cleanly where the others would force you into a separate product. The pieces that bite are GCP-only deployment, narrower model breadth than Bedrock, and the sprawling product surface (AI Studio vs Vertex AI Studio vs Agent Builder vs Vertex AI Search).

You should pick Foundry if

Your org runs on Microsoft 365 and Entra, and you need GPT-5 with the Azure compliance certifications. That's the case Foundry was built for, and it executes it well. Entra ties auth to identities your org already manages, Purview classifies the data flowing through, Sentinel catches anomalies, and the Microsoft 365 Copilot plumbing lives on the same surface if you want to extend it. The Foundry Catalog covers the open-weight tail with the broadest selection of the three. The pieces that bite are the OpenAI version lag (new GPT releases land on Azure days to weeks after the OpenAI direct API), the docs catching up to the Foundry rebrand, and the most fragmented region matrix of any of the three platforms. If you're not already in the Microsoft ecosystem, Foundry is harder to justify.

Part 05

The and-not-or pattern

Most teams I work with end up using more than one. The question is rarely which platform; it's which is the primary one. — TWD

The pure single-platform team is rarer than the slide decks suggest. By the time a team has a real production AI footprint, it usually has at least two of these three in some form. The patterns I see most often:

Bedrock primary, Foundry for one OpenAI workload. The shop runs on AWS and uses Claude on Bedrock for almost everything. There's one specific feature (a vision task that GPT-5 happens to do better, or a customer request that needs OpenAI specifically) that runs on Foundry. The Foundry footprint is small and stable; the Bedrock footprint grows.
Vertex primary, Bedrock for Anthropic capacity. The shop runs on GCP and uses Gemini for the workloads that benefit from Workspace integration or long context. Anthropic's direct API is fine for prototyping, but the production path needs more Claude capacity than Vertex's Claude region story comfortably handles, so production Claude calls land on Bedrock in a separate AWS account that exists mostly for that purpose.
Foundry primary, Vertex for Gemini-specific tasks. The shop runs on Microsoft and uses GPT-5 on Foundry for the bulk of inference. A small Vertex footprint exists for the one workflow that needs Gemini's specific capabilities (often a multimodal or long-document task). Costs are dwarfed by the Azure spend; the Vertex line is a rounding error.
Direct vendor APIs alongside any of the above. Nearly universal. Anthropic, OpenAI, and Google AI Studio direct keys exist in the prototyping bench and in the eval harness even when production runs entirely on a cloud platform. The friction of standing up a new model on a cloud platform is high enough that nobody does it for sketches.

The implication for your design: build the model choice behind a clean abstraction from the start. Converse on Bedrock, the Vertex SDK with the model parameter, the Foundry SDK on Azure, and the vendor SDKs direct. If your application calls out to a single function that takes a model identifier and a message list, swapping platforms or adding a second one later is a configuration change, not a refactor. Skip that abstraction and you're paying for it the day procurement asks for the OpenAI workload to move to Azure.

Tip The clean abstraction does not have to be heavyweight. A thin internal client that picks between three SDKs based on a model-id prefix is enough. You don't need a framework; you need a function. The teams that build elaborate model-routing layers up front almost always over-engineer them; the teams that skip the abstraction entirely almost always regret it within a year.

Part 06

The version-lag bite

The part of the cloud-platform story that gets undersold in the marketing: the cloud platforms lag the source vendor's direct API on new model releases by days to weeks. A new Claude generation lands on Anthropic's direct API on launch day; it lands on Bedrock anywhere from same-day to a couple of weeks later, depending on the release. A new GPT-5 version ships on the OpenAI API; it shows up on Azure Foundry days to weeks after, and the regional rollout spreads over more weeks. A new Gemini drops on AI Studio; the Vertex production endpoints follow on a similar lag (shorter, because it's Google's own product, but real).

That gap matters less than it used to. In 2024 it was often months; by May 2026 it's usually days. But it isn't zero, and it bites in three specific ways:

Feature lag, not just version lag. Even when the model name lands on the cloud platform on the same day, the feature surface around it sometimes lags. Prompt caching, a new tool-use shape, a new reasoning mode, a new long-context ceiling: those can land on the direct API first and on the platform later. If your application depends on a brand-new feature, check the platform's release notes alongside the vendor's, not just the model availability page.
Quota lag. Even when the model and feature are both available on the platform, default per-account quotas for new models start low and grow over weeks as the platform observes usage. Production workloads that tried to flip to the new version on day one have hit throttle-induced outages because the quota wasn't there yet.
Behavior drift on minor versions. The cloud platforms occasionally pin to a slightly older minor version even when the model name matches. Default safety filters can also be more aggressive than on the direct API, causing refusals you didn't see in prototype. If you're seeing a behavior gap between your prototype (direct API) and production (cloud platform), log both responses and diff them; the difference is usually a parameter default, a guardrail intercept, or a minor-version pin rather than the model itself.

What this means for production planning: do not assume "we'll be on the new model on day one" if your production path is a cloud platform. Build the assumption into your roadmap that production lags the prototype by one to four weeks for a major release, and longer if you also need a quota increase. The teams that get burned here are usually the ones that promised a customer the new model on launch day based on the press release, not on the actual platform availability.

Warn The honest middle path: keep direct vendor API keys in the prototype bench and the eval harness, even when production is entirely on a cloud platform. The day a new model ships, you can evaluate it against your own data within hours instead of waiting weeks for the platform path to catch up. When the platform availability lands, you already know whether you want to roll forward.

Which platform handles the lag best

Vertex is the strongest on this dimension for one specific reason: when the model in question is Gemini, there is effectively no lag (it's Google's own product, shipped on Google's own platform). For everyone else's models, Vertex is in the middle of the pack. Bedrock is the strongest for new Claude releases (Anthropic ships to Bedrock close to direct, often within days), and it's middle-of-the-pack for the other vendors. Foundry has the most consistent lag behavior across vendors, because OpenAI ships everything to Azure on a similar cadence; it's not the fastest, but it's the most predictable. If your roadmap depends on a specific model on a specific date, the answer is to verify against the source vendor first and the platform second, every single time.

Part 07

A decision-flow recipe

Six scenarios, with the call I'd make for each. If your situation matches one of these, the answer is probably the matching call. If it doesn't, the frame in Part 01 is the path to your own answer.

"We're an AWS-native shop already."

Bedrock. The IAM integration alone is worth it; you skip a whole class of credential and audit work. Default to Claude (Sonnet for most things, Opus for the hard stuff), use the Converse API so you can swap models later, and budget a half-day to learn the per-account model-enable dance and the per-model region matrix. Add Foundry only if you have a specific workload that requires OpenAI compliance.

ii.

"We're a GCP-native shop already."

Vertex AI. Default to Gemini for anything that benefits from long context, multimodal input, or Workspace integration; use Claude on Vertex (in us-east5 or europe-west1) for the workloads where you specifically want Claude. Lean on Vertex AI Search for managed RAG; the retrieval quality is the strongest of the three. The MLOps tooling is real value if you have any training or monitoring needs alongside inference.

iii.

"We're a Microsoft 365 / Entra shop already."

Foundry. Azure OpenAI for GPT-5 with the compliance and residency controls your enterprise procurement is going to demand anyway, plus the Foundry Catalog for open-weight options when you need them. The integration with Purview, Sentinel, and the M365 Copilot plumbing is worth the platform price. Plan around the OpenAI version lag; keep an OpenAI direct key in the prototype bench so you can evaluate new releases without waiting on Azure.

iv.

"We have no existing cloud footprint."

Stay direct with the model vendor for now. The cloud platforms' main value is auth, audit, residency, and procurement; if none of those bind for you yet, you're paying complexity for nothing. Use Anthropic, OpenAI, or Google AI Studio direct, build the abstraction so swapping is cheap, and revisit the platform decision when you actually have a compliance or scale reason. Don't pick a cloud just for a model.

"We need OpenAI specifically with enterprise compliance."

Foundry. There is no other managed path to GPT-5 with the Azure compliance certifications (FedRAMP High, IL5, HIPAA, the EU sovereign options). If your org isn't already on Azure, the answer is still Foundry; you'll absorb the operational cost of standing up an Azure footprint specifically for this workload. Plan for the version lag and the quota dance.

vi.

"We need the latest Claude on day one."

Use Anthropic direct for the prototype and eval path, even if production lives on Bedrock. Anthropic ships features (prompt caching shapes, computer use, the Files API) on direct first, and Bedrock catches up days later. The cost of running both is small (the keys are free; you only pay per call), and the value is being able to test new releases the day they land instead of waiting on the platform path.

If you walked through these six and none of them match cleanly, you're in the genuine slate-and-paper case. Reread Part 01 with your specific constraints in hand, work through the three follow-up questions, and the answer usually presents itself within an afternoon. The worst outcome is overthinking this; the platforms are converging fast enough that the wrong pick is rarely catastrophic, and the abstraction in Part 05 makes switching costs bounded.

One closing observation

The choice between these three is more durable than it feels in the moment. Once your team has IAM patterns, audit pipelines, and CI/CD wired into one cloud's inference platform, switching is a real project, not a refactor. That's not a reason to agonize over the pick; it's a reason to make the obvious one and move on. The obvious pick is almost always your existing cloud footprint. If that pick gives you 80% of what you need on a given workload, ship it; the remaining 20% goes on a second platform, kept small and stable, with a clean abstraction between them.

And if any of this is out of date by the time you read it, the deeper coverage in the cloud platforms field guide updates on the same cadence as this page. The model names and version numbers in here will go stale faster than the framing will; the framing is the part that's been holding up across the last two years of platform churn.

One last note on the local models field guide: if your decision is actually "cloud platform vs running models in our own data center," that's a different question with a different answer, and the local-models guide is where to start. This page assumes you've already decided that managed inference is the right shape; the local-vs-cloud question is upstream of everything here.