№ 14Learn With Darin · Field Guide

Cloud platforms: a practitioner's field guide.

AWS Bedrock, Google Vertex AI, and Azure AI Foundry are the three big managed inference platforms. They all do roughly the same job (one API, many models, your cloud's auth and audit), and they all bite in roughly different ways. This is how I pick between them, and when I skip them entirely.

Updated May 2026 ~25 min read Covers Bedrock, Vertex AI, Azure AI Foundry

Part 01

What "cloud platform" means here

A cloud-hosted model platform is a managed multi-model inference service. You bring an account in someone's hyperscaler (AWS, Google Cloud, Azure). They give you a single API surface, a single billing relationship, and a single auth boundary that fronts a catalog of foundation models from many vendors. Anthropic's Claude. Meta's Llama. Mistral. Cohere. Amazon's Nova. OpenAI's GPT, on Azure. Google's own Gemini, on Vertex. And, increasingly, an open-weight long tail.

The pitch is simple. Instead of holding direct vendor relationships with Anthropic, OpenAI, Google, Meta, Mistral, and Cohere (each with its own API keys, contracts, billing, and audit trail), you hold one with your cloud. The cloud handles the underlying contracts, the regional availability, the data residency, and the security posture. Your application code talks to a single endpoint and you swap models behind it.

The three platforms in this guide are the ones with real production uptake as of May 2026:

AWS Bedrock. AWS's LLM gateway. The widest model selection of the three, and the only managed home for Anthropic's Claude on AWS. Auth is plain IAM.
Google Vertex AI. Google Cloud's LLM platform. Native Gemini, plus Claude, Llama, Mistral, and a Model Garden of open-weight options. Auth is GCP IAM with service accounts.
Azure AI Foundry. Microsoft's umbrella for Azure-hosted models. The Azure OpenAI Service (GPT-5, the o-series, DALL-E, Whisper) plus the Foundry Model Catalog. Auth is Microsoft Entra.

Out of scope for this guide: the model vendors' own APIs (Anthropic, OpenAI, Google AI Studio), other inference services (Together, Fireworks, Groq, Replicate, Cloudflare Workers AI), and anything self-hosted. They matter, they're real options, and they don't fit the same shape as a hyperscaler-managed gateway.

Note Cloud platforms are an "and" with direct-vendor APIs, not an "or." Most teams I work with use both. Direct APIs for the prototyping bench and for accessing day-zero model releases; a cloud platform for the production path that has to live inside the org's compliance and auth story. Picking one platform is a real decision; picking it as the only thing you ever use is usually a mistake.

The reason these platforms exist, structurally, is the same reason RDS exists. Running models is operationally expensive. Holding contracts with six vendors is administratively expensive. The cloud already has your IAM, your VPC, your audit pipeline, and your finance approvals. Putting models behind that same plane is worth real money to teams who already pay the cloud tax. For teams that don't, it's overhead they don't need.

Part 02

AWS Bedrock

Bedrock is the AWS-native LLM gateway. One IAM-scoped API surface in front of a long list of model families. As of May 2026, the lineup that matters in production:

Anthropic Claude: Sonnet, Opus, and Haiku. Bedrock is the supported managed home for Claude on AWS, and for many enterprise teams it's the only way procurement will let Claude into production.
Meta Llama: the 3.x family and the Llama 4 generation. The open-weight workhorse choices when you want full control over fine-tuning or steady predictable cost.
Mistral: the Mistral Large family plus the smaller mixtures.
Cohere Command: Command R+ and the embeddings models.
AI21 Jamba: the long-context Mamba-style models.
Amazon Nova: Amazon's own foundation family. Nova Pro, Lite, and Micro for text; Nova Canvas for image; Nova Reel for video. Cheaper than Claude on like-for-like tasks and integrates well when you want everything in one vendor.
Stability: Stable Diffusion family for image generation.
DeepSeek and others: added regionally as availability shifts. The roster moves; check the console.

Surfaces you'll actually use

Bedrock console: chat playground, model evaluation jobs, prompt management, and the place you go to enable models for an account (yes, you have to enable them; this is the part that catches first-timers).
Bedrock Runtime API: InvokeModel for the per-vendor native shape, and Converse for a unified chat surface across model families. Use Converse unless you have a reason not to. It's the API that makes "swap the model behind the call" actually work.
Bedrock Agents: the original hosted agent framework. AWS-managed loop with action groups, knowledge bases, and tool calling. Quick to stand up, opinionated about how the loop runs.
Bedrock AgentCore: the newer, production-grade agent runtime. Framework-agnostic (bring LangGraph, CrewAI, Strands, or your own), with managed Runtime, Memory, Identity, Gateway, Browser, Code Interpreter, and Observability as separate building blocks.
Bedrock Knowledge Bases: managed RAG over S3 plus a vector store of your choice (OpenSearch Serverless, Pinecone, Aurora pgvector, Redis). Handles chunking, embedding, indexing, and retrieval.
Bedrock Guardrails: prompt-time and response-time content filters with policies for denied topics, PII redaction, and contextual grounding.
Provisioned Throughput: reserved per-model capacity for predictable, fixed-cost inference. The right shape for steady production load.

Bedrock Agents vs AgentCore

This is the choice that confuses every team building agents on AWS. Both ship under the Bedrock umbrella, both can do tool calling, and both will technically run the same workloads. They are not the same product, and picking the wrong one costs weeks.

Bedrock Agents

One AWS-managed object: an agent definition with action groups (Lambda functions), an attached knowledge base, and a system prompt. Console-configurable.
AWS owns the loop. You describe tools and instructions; Bedrock decides when to call them, when to retrieve, when to answer.
Fast to stand up. A working tool-using chatbot in an afternoon.
Limited control over the loop. Hard to plug in a custom framework, hard to do unusual orchestration patterns, hard to debug when the agent decides something you didn't expect.
Best when the workload is "chatbot with a few tools and a knowledge base," and you want AWS to make the orchestration decisions.

AgentCore

A set of separable services: Runtime (serverless host for long-running agent processes), Memory (managed short-term and long-term agent memory), Identity (delegated auth so agents act as a real user against Slack, Gmail, Salesforce), Gateway (host MCP servers and tool catalogs), Browser, Code Interpreter, Observability.
You own the loop. AgentCore Runtime hosts whatever agent code you bring, in any framework, with sessions that survive across requests.
Production-grade primitives: durable sessions, isolated execution, identity-aware tool calls, OpenTelemetry-shape traces.
More setup. You're picking which AgentCore services to adopt and wiring them to your code, not filling out a console form.
Best when the workload is a real agent (multi-step, long-running, stateful) and you want production infrastructure under it without writing your own runtime.

How to choose Three rules of thumb. If you don't need an agent framework at all (single-turn or short multi-turn with tool calls), stay on the Converse API and skip both. If you want a chatbot with a few tools and AWS to run the loop, use Bedrock Agents; you'll ship faster. If you're deploying a real agent (LangGraph, CrewAI, Strands, your own framework) and you need durable memory, identity for tool calls, and production observability, use AgentCore. The two are not mutually exclusive: I've seen teams use Bedrock Agents for an internal helpdesk bot and AgentCore Runtime for a separate customer-facing workflow agent in the same account.

Pricing and billing

Per-token by model, with both on-demand and provisioned tiers. Cross-region inference profiles spread requests across regions to dodge throttling, and they bill the same per-token rate as the originating region. Knowledge Bases bills the underlying storage and embeddings separately; budget for that, not just the model calls.

Auth and region story

Auth is plain IAM. There are no separate Bedrock API keys; you sign requests with your normal AWS credentials. That's the killer feature for AWS-native shops: one access plane, one audit trail, one set of policies. Region availability is the part that bites: not every model is in every region. As of May 2026, Anthropic models are in us-east-1, us-west-2, eu-central-1, ap-northeast-1, and a growing list. Read the region matrix in the Bedrock docs before you design; assuming "the model is everywhere" will burn an afternoon.

Warn Bedrock requires explicit per-account model access. In the console, under "Model access," you grant your account permission to use each model family. New accounts have nothing enabled by default, and the request can take minutes to hours to approve depending on the model. If your InvokeModel call returns AccessDeniedException with a clear "model not enabled" message, that's where to look first.

Where it bites

The API is verbose. The model availability matrix is annoying to navigate. Latency is fine but rarely beats going direct to the source vendor. Costs add up if you don't reserve provisioned throughput for steady workloads. The AWS console is its own learning curve, and Bedrock's surface (console, runtime, agents, knowledge bases, guardrails) is enough product to need a champion on the team. None of this is disqualifying. All of it is real.

Part 03

Google Vertex AI

Vertex AI is Google Cloud's LLM platform. The pitch in one sentence: it's the only place to run Gemini with full enterprise controls (IAM, VPC Service Controls, data residency, audit), and it carries enough other models that you can land most multi-model workloads here without leaving GCP.

Model lineup

Gemini: the same Gemini 2.5 Flash and Pro that power gemini.google.com, plus the developer-only variants and the embedding models. Vertex is the supported production path for Gemini; you do not run Gemini in production from an AI Studio key, you run it from Vertex.
Anthropic Claude: Sonnet and Opus, available in specific regions (us-east5 and europe-west1 as of May 2026). The feature parity with Anthropic's direct API is close but not perfect; some newer features land later.
Meta Llama: the 3.x and 4 generations, available as both Vertex-managed endpoints and Model Garden deployments.
Mistral: Mistral Large and the smaller open variants.
Model Garden: a long list of open-weight options (Gemma, Falcon, several HuggingFace models) deployable as managed endpoints with a few clicks. Quality of "deployable" varies; some entries are reference deployments, others are production-grade.

Surfaces

Vertex AI Studio: the GCP-side playground. Confusingly named, because Google also runs an "AI Studio" at aistudio.google.com which is the consumer-keyed cousin. Vertex AI Studio is the enterprise-keyed one; AI Studio is the rapid-prototyping one. Use AI Studio for sketches, Vertex for production.
Vertex AI API: the REST and gRPC surfaces. The Python SDK is the most polished; the others are catching up.
Agent Builder: Google's hosted agent framework, layered on top of the model surface.
Vertex AI Search and Conversation: managed RAG, the Google equivalent of Knowledge Bases. Strong on the search side (it's Google), reasonable on the conversational composition side.
Pipelines, Model Monitoring, Experiments: the MLOps tooling Google has been building for years. This is the part of Vertex that doesn't have a good equivalent on Bedrock or Foundry.

Pricing, auth, region

Per-token for managed models, with optional Provisioned Throughput for steady workloads. A free tier exists for experimentation. Auth is GCP IAM with service accounts, Application Default Credentials, or Workload Identity Federation if you're calling from outside GCP. Gemini models are broadly available; Claude on Vertex is restricted to a few regions; full data-residency controls are available with VPC Service Controls and Customer-Managed Encryption Keys.

Where it shines

Native Gemini access with enterprise controls. If you want Gemini in production with audit, IAM, and residency, Vertex is the only supported path.
The Workspace data integration story. With proper consent, Vertex can read your org's Workspace data through the Google security boundary. No third-party connector hops.
MLOps tooling. Pipelines for orchestrating model workflows, Model Monitoring for drift detection, Experiments for tracking runs. If you came from a Kubeflow or MLflow background, this is the most familiar of the three platforms.

Where it bites

It's GCP-only, which is a real cost if your org isn't already there. Model breadth is narrower than Bedrock's; the Model Garden helps but the production-quality tier is smaller. The product surface is a sprawl: AI Studio, Vertex AI Studio, Gemini Code Assist, Agent Builder, Vertex AI Search, and Vertex AI Pipelines all overlap in confusing ways. And some Anthropic features lag the direct Anthropic API by days or weeks; if you need a brand-new Claude capability on day one, Vertex isn't the path.

Part 04

Azure AI Foundry

Azure AI Foundry is Microsoft's umbrella product for Azure-hosted models. It absorbed the older "Azure AI Studio" branding and now covers both the Azure OpenAI Service and the broader Foundry Model Catalog. The pitch: it's the only managed home for OpenAI's frontier models with Azure's compliance, residency, and Microsoft 365 integration story.

What it hosts

Azure OpenAI Service: GPT-5 and its variants, the o-series reasoning models, GPT-4 generations still kept around for compatibility, DALL-E 3, Whisper, and the OpenAI embeddings. This is the meaningful part for most Azure-anchored teams: GPT models with Azure's contract, audit, and residency.
Foundry Model Catalog: hundreds of additional models (Llama, Mistral, Phi, Cohere, Stable Diffusion, plus a long HuggingFace tail) deployable as managed serverless endpoints or as dedicated capacity.

Surfaces

Foundry portal: the playground plus a project workspace concept. Projects scope deployments, evaluations, and prompt assets together.
Azure OpenAI REST API: the OpenAI-compatible API on Azure endpoints. If your code already speaks the OpenAI Python SDK, you can point it at Azure with a base URL and an API version and most things work.
Foundry SDK: the new SDK that wraps both the OpenAI surface and the broader catalog. Useful if you want one client across both.
Prompt Flow: visual orchestration for prompt pipelines, with evaluation and deployment built in.
Content Safety: the Azure equivalent of Bedrock Guardrails. Prompt and response filtering, jailbreak detection, groundedness checks.
Evaluations: a managed eval harness with built-in metrics and the ability to plug in custom ones.
Agent Service: Microsoft's hosted agent framework. Newer and more enterprise-flavored than the OpenAI Assistants API it descends from.

Pricing, auth, region

Per-token for OpenAI models, varying by model and region. Both Standard (per-token, on-demand), Provisioned Throughput Units, and the newer serverless options for catalog models. Free tier through Azure trial credits. Auth is Microsoft Entra (Azure AD) with managed identities for in-Azure callers and Entra ID app registrations for external ones. Region availability is the most fragmented of the three platforms: specific OpenAI versions live in specific regions, and Sovereign Cloud (Azure Government, Azure China) options exist for regulated workloads.

Where it shines

Enterprise integration. Entra for auth, Purview for data classification, Sentinel for security logging, Defender for threat detection. If your org runs on Microsoft, Foundry slots into a security plane that's already there.
The OpenAI residency story. Azure OpenAI is the only managed path to GPT-5 with contractual data residency, no-training commitments, and the Azure compliance certifications (FedRAMP High, IL5, HIPAA, and the rest).
Microsoft 365 integration. Copilot's underlying plumbing lives here. If you want to extend the M365 Copilot story or build on top of Graph data with managed AI, Foundry is the surface.

Where it bites

The OpenAI version lag. New OpenAI model releases sometimes land on Azure days or weeks after the consumer ChatGPT and OpenAI direct APIs. For most production teams that's fine; for teams chasing the bleeding edge it's a real cost. The Foundry brand is new and the docs are catching up; older Microsoft Learn articles still reference Azure AI Studio and Azure Machine Learning Studio. The quota system requires capacity requests for many model SKUs and the request flow can be slow. And the product naming is its own challenge: Foundry, Azure OpenAI, Azure ML, Copilot Studio, Power Platform AI Builder all overlap in ways that take time to map.

Part 05

A three-way comparison

Side by side on the dimensions that actually move decisions. There is no overall winner. There's a winner per dimension, and the right pick is whichever lines up with the constraints you actually have.

AWS Bedrock

Model breadth: widest. Claude, Llama, Mistral, Cohere, AI21, Nova, Stability, plus DeepSeek and a moving list.
Latest-model freshness: good for Claude (Anthropic ships to Bedrock close to direct), uneven for the others.
Latency: fine. Rarely beats going direct to the source vendor.
Native ecosystem: AWS-shaped. IAM, S3, CloudWatch, the rest.
Region story: model-by-model matrix. Annoying. Read it.
Agent framework: Bedrock Agents.
Managed RAG: Bedrock Knowledge Bases. Solid.

Vertex AI

Model breadth: medium. Gemini, Claude, Llama, Mistral, plus the Model Garden tail.
Latest-model freshness: best for Gemini (it's Google's own), good for Claude with some lag.
Latency: very good for Gemini calls, comparable to others elsewhere.
Native ecosystem: GCP-shaped. IAM, GCS, BigQuery, Workspace.
Region story: Gemini broad, Claude restricted, residency strong.
Agent framework: Agent Builder.
Managed RAG: Vertex AI Search and Conversation. Strong on retrieval.

Azure AI Foundry

Model breadth: medium-deep. Azure OpenAI plus Foundry Catalog (Llama, Mistral, Phi, Cohere, the HF tail).
Latest-model freshness: trails OpenAI direct by days to weeks for new GPT releases.
Latency: good. Provisioned Throughput Units help for steady load.
Native ecosystem: Azure-shaped. Entra, Purview, Sentinel, M365.
Region story: most fragmented; sovereign cloud options exist.
Agent framework: Agent Service.
Managed RAG: Foundry Indexes. Reasonable.

Dimension	Bedrock	Vertex AI	Foundry
Anthropic Claude	full	most regions	not hosted
OpenAI GPT-5 / o-series	not hosted	not hosted	full
Google Gemini	not hosted	full	not hosted
Meta Llama	full	full	full
Mistral	full	full	full
Open-weight catalog	limited	Model Garden	Foundry Catalog
Native auth plane	IAM	GCP IAM	Entra ID
Provisioned capacity	Provisioned Throughput	Provisioned Throughput	PTUs
Cross-region inference	first-class	via routing config	via deployment regions
Managed RAG	Knowledge Bases	Vertex AI Search	Foundry Indexes
Agent framework	Bedrock Agents	Agent Builder	Agent Service
Guardrails layer	Bedrock Guardrails	Safety filters	Content Safety
MLOps tooling	SageMaker (separate)	Pipelines, Monitoring	Azure ML (adjacent)
Sovereign / GovCloud	AWS GovCloud	Assured Workloads	Azure Government

The shorthand I use: Bedrock for breadth and IAM, Vertex for Gemini and MLOps, Foundry for OpenAI and Microsoft. Each is the obvious choice in its primary lane, and a respectable second choice in the others.

Part 06

Cross-cutting concepts

Four ideas show up on all three platforms with slightly different names and largely the same shape. Knowing them once is enough.

Provisioned throughput

Paying for reserved per-model capacity instead of paying per token at on-demand rates. Bedrock calls it Provisioned Throughput. Vertex calls it Provisioned Throughput. Azure calls it Provisioned Throughput Units (PTUs). Same idea, three names. The right shape when you have steady, predictable load and you're consistently hitting the on-demand rate cap; the wrong shape for spiky, exploratory workloads where you'll pay for capacity you don't use. The cost crossover is usually somewhere around 60-70% sustained utilization. Below that, on-demand is cheaper; above that, provisioned wins.

Cross-region inference

Spreading requests across regions to dodge throttling and improve resilience. Bedrock has it as a first-class concept (cross-region inference profiles) where you call one logical endpoint and it routes to the least-loaded region. Vertex and Foundry achieve the same effect through deployment-region configuration and client-side routing. The pattern matters because a single region's per-model rate limit will bite you in production even when your aggregate volume is small; using two or three regions buys headroom without increasing cost.

Guardrails and content safety

Every platform now offers a prompt-time and response-time filtering layer. Bedrock Guardrails, Vertex safety filters, Azure Content Safety. They cover roughly the same ground: denied topics, PII redaction, jailbreak detection, and (newer) groundedness checks against retrieved context. They are not a substitute for application-level review and they will not catch a determined adversarial prompt, but they catch the obvious. Treat them as a defense-in-depth layer, not the primary defense.

Managed RAG

Every platform offers a "drop documents into a bucket, get retrieval" service: Bedrock Knowledge Bases, Vertex AI Search, Foundry Indexes. They handle the chunking, the embedding, the indexing, and the retrieval call from your model. Quality varies. Bedrock Knowledge Bases is the most flexible on backing store choice; Vertex AI Search is the strongest on retrieval quality (it's Google); Foundry Indexes is the most integrated with the rest of the Foundry stack.

Tip Before committing to managed RAG, build a rough hand-rolled version on the same documents and compare retrieval quality at the same top-k. The managed services are convenient but they're opinionated about chunking, and their default chunking is rarely optimal for any specific corpus. If hand-rolled wins on quality, the operational cost of running it yourself is often less than the time you'll spend tuning the managed one to match.

One more cross-cutting note. All three platforms expose a no-training-on-your-data commitment in their enterprise terms, and all three have the certifications most enterprise procurement asks for (SOC 2, ISO 27001, the cloud-specific equivalents). The compliance differences between them are real but small; they rarely decide a platform choice on their own.

Part 07

Practical workflows

Six patterns I keep coming back to. They aren't tied to a specific platform; the same shape works on all three with the names changed.

Switching from direct Anthropic to Bedrock for compliance.

You've prototyped against Anthropic's direct API. Procurement now wants the call inside your AWS account, with IAM and audit. The migration is small if you've used the SDK well: swap the client to the Bedrock runtime client, change the model ID to the Bedrock variant (e.g., anthropic.claude-sonnet-4-7-20260315-v1:0), and use the Converse API to keep the message shape stable. Test on the same prompts; the answers should be effectively identical. Watch for a one-time gap on the very newest features; they reach Bedrock a few days behind direct.

ii.

Building managed RAG without writing it from scratch.

For a corpus under a few hundred thousand documents that updates daily, managed RAG (Knowledge Bases, Vertex AI Search, Foundry Indexes) gets you to "good enough" in an afternoon. Drop the docs in object storage, point the service at it, configure chunk size and embedding model, and call the retrieve-and-generate endpoint. Don't ship until you've sampled retrieval results manually on twenty or thirty real queries. The defaults work; verifying they work for your corpus is the part you can't skip.

iii.

Spreading across regions to avoid throttle.

In Bedrock, enable a cross-region inference profile for the model you're using. In Vertex and Foundry, deploy the same model in two or three regions and load-balance at the client. Single-region rate limits are the most common production surprise; two regions usually doubles your effective throughput at no additional list price. Watch the egress story if your data has to stay in one region for compliance.

iv.

Swapping models behind a feature flag.

Use the unified API on each platform (Converse on Bedrock, the Vertex SDK with the model parameter, the Foundry SDK) and route the model choice through a feature flag. When a new model version lands, ramp traffic 1% then 10% then 50%, with output-quality evals on a held-out set at each step. This is the workflow that makes "we run on three different models behind the same surface" actually safe instead of theoretical.

Evaluating a new model on your data.

All three platforms now have a managed evaluation surface (Bedrock Model Evaluation, Vertex Experiments, Foundry Evaluations). Upload a small held-out set with reference outputs or judge criteria, run the same prompts against the candidate model and your current production model, and compare. The managed tooling saves you from rebuilding eval infrastructure each time. Cap the eval at a few hundred examples; a 5,000-example eval rarely tells you more than a 200-example one if the examples are well chosen.

vi.

Cost control for a steady workload.

Run the workload on on-demand for a week. Pull the actual token volume from CloudWatch (Bedrock), Cloud Monitoring (Vertex), or Azure Monitor (Foundry). If you're sustaining more than 60-70% of a Provisioned Throughput unit's nominal capacity, switch. The cost difference is usually 30-40% in favor of provisioned at high utilization, and the latency is more predictable as a bonus. Re-evaluate every quarter; usage shifts.

Part 08

Choosing a platform (or staying direct)

Pick the cloud you already pay. Pick the model you already trust. Don't pick both at once. — TWD

The decision frame I run for clients, in order:

What's your existing cloud footprint? If you're 90% AWS, Bedrock is the default and you should need a real reason to pick something else. Same for GCP-Vertex and Azure-Foundry. The auth, audit, and finance benefits of staying in your existing cloud are larger than they look on paper, and they compound over time. Picking a second cloud just for a model is rarely worth the operational tax.
What's the compliance story? If you have a contract that says "data stays in EU" or "data does not leave the US government cloud," that often picks the platform for you. Sovereign cloud options exist on all three, but they're not equivalent. Azure Government and AWS GovCloud are the most mature for US federal; Azure has the strongest EU sovereign story; GCP's Assured Workloads is the youngest but credible.
Which model do you actually need? If the answer is "GPT-5 specifically," Foundry is the only managed home for it. If the answer is "Claude specifically," Bedrock is the broadest path. If the answer is "Gemini specifically," Vertex is the only path. If the answer is "Llama or Mistral," all three work and the cloud footprint question dominates.
How much do you care about model freshness? If you need day-zero access to new model versions, you want the direct vendor API for at least your prototype path, regardless of which platform you pick for production. Cloud platforms lag by days to weeks on new releases. That gap matters less than it used to (it was months in 2024) but it isn't zero.
What's your team's skill profile? AWS-fluent teams ship on Bedrock fastest. GCP-fluent teams ship on Vertex fastest. Microsoft-fluent teams ship on Foundry fastest. The platform that matches your existing operational competence is almost always the right one even when another scores marginally better on a feature checklist.

When to stay direct with the model vendor

The case for skipping cloud platforms entirely:

You're prototyping. The friction of setting up a Bedrock account, enabling models, configuring IAM, and waiting on quota is real. For exploration, an Anthropic or OpenAI key is faster.
You want the latest model on day one. Direct APIs get new versions first.
You don't have enterprise compliance requirements yet. The cloud platform's main value is auth, audit, residency, and procurement; if none of those bind for you, you're paying complexity for nothing.
You want the full feature surface. Anthropic ships features like prompt caching, computer use, and the Files API on direct first. The cloud platforms catch up, but they're behind.

The honest middle path that most teams I work with land on: direct vendor APIs for the prototype bench and for early access; one cloud platform for the production path that has to live inside compliance. The two coexist. The same prompt template, the same eval set, the same model family on both sides; the platform changes, the work doesn't.

Part 09

Pitfalls and gotchas

The places these platforms are most likely to disappoint, in the order you'll hit them.

"The model isn't available in my region"

Most common first-day surprise. Each platform publishes a region-by-model matrix, and not every model is in every region. Anthropic on Bedrock is in maybe a dozen regions; Anthropic on Vertex is in two or three; new OpenAI models on Foundry land in a few regions and propagate over weeks. Read the matrix before designing. If you need a specific model in a specific region and it isn't there, your options are wait, switch regions, or switch models. There is no client-side workaround.

"The version on the platform lags the direct API"

This is the part rarely talked about. A new Claude generation or a new GPT-5 version may take days to weeks to land on the cloud platform versus the source vendor's direct API. If your application depends on a brand-new feature (a new tool-use shape, a new prompt-caching behavior, a new reasoning mode) that feature may not be available on your cloud platform yet even when the model name is. Read each platform's release notes alongside the vendor's; don't assume the names tell you everything.

"My request is being throttled"

Default per-account, per-region rate limits are lower than people expect. Bedrock and Foundry both enforce TPM (tokens per minute) and RPM (requests per minute) caps. The fix is usually a quota increase request through the console plus enabling cross-region inference. Don't expect quota to be granted instantly; build the request into your launch timeline. Provisioned Throughput buys you a higher ceiling, but it's a contract, not a quick fix.

"The Foundry / Bedrock / Vertex docs contradict the console"

All three platforms have moved fast and the docs lag. Bedrock renamed several APIs in 2024 and old guides still reference the old names. Foundry absorbed Azure AI Studio and the Microsoft Learn surface is still catching up. Vertex has overlapping product names (Vertex AI Studio vs AI Studio, Agent Builder vs Vertex AI Conversation) that mix in old articles. When the docs and the console disagree, trust the console; when both look wrong, check the platform's official blog or release notes for the most recent change.

"I enabled the model but the API still says access denied"

On Bedrock specifically: model access is per-account, per-region. Enabling a model in us-east-1 does not enable it in us-west-2. The error message usually points at this but not always; if you're hitting AccessDeniedException on what should be a working call, double-check the region of both the call and the enablement. The same shape exists on Foundry as deployment scoping; a deployment in West US 3 is not callable from East US.

"The bill is much higher than I forecasted"

Three usual suspects. First, you enabled Provisioned Throughput for testing and forgot to release it; PTU and Provisioned Throughput bill whether you're using them or not. Second, your managed RAG is re-embedding documents on every update instead of incrementally; check the embedding job logs. Third, your guardrails layer is making a second model call per request without you realizing it; some guardrail policies do that. Pull the cost breakdown by service line, not just by model.

"The agent framework is fighting me"

Bedrock Agents, Agent Builder, and Agent Service are all opinionated about how the agent loop runs. They're great when their opinions match yours and frustrating when they don't. If you find yourself fighting the framework on every iteration, the right move is usually to drop down to raw model calls plus your own loop. The managed agent surfaces are best when you accept their shape; they're worse than a small custom loop when you don't.

"The model is on the platform but it isn't behaving like the direct API"

Subtle and real. Default safety filters on the cloud platforms are sometimes more aggressive than on the direct API, which can cause refusals you didn't see in prototype. Cloud platforms also occasionally pin to slightly older minor versions even when the model name matches. If you're seeing a behavior gap, log both responses and compare; the difference is usually a parameter default (temperature, top-p, system prompt handling) or a guardrail intercept rather than the model itself.

One closing observation

The three cloud platforms are converging. In 2024 they each had distinct strengths and obvious gaps. By May 2026 the gaps are smaller, the feature lists rhyme, and the choice is mostly about which cloud you already live in and which models you actually need. That convergence is a good thing for buyers; it means the wrong pick is rarely catastrophic, and switching costs are real but bounded. Build your code with the model choice behind a clean abstraction (Converse, the Vertex SDK, the Foundry SDK) and the platform-level decision becomes one you can revisit later with less pain than you'd expect.

And if any of this is out of date by the time you read it: aws.amazon.com for Bedrock, cloud.google.com for Vertex, and techcommunity.microsoft.com for Foundry. All three move fast; assume the version numbers in this guide are wrong by the time you read it, and verify before you ship.