№ 08Learn With Darin · Field Guide
Antigravity: a practitioner's field guide.
Google's agentic IDE, public preview since November 2025. A forked VS Code with a multi-agent posture: not "one assistant in your editor," but a Kanban-shaped control plane for several agents working in parallel, plus a more familiar Editor view for the one-task-at-a-time work you already know.
What Antigravity actually is
Antigravity is Google's first serious entry into the agentic-IDE category, and the first one to put multi-agent supervision at the center of the product instead of bolting it onto a chat sidebar. It launched in public preview in November 2025 and has been shipping weekly since. As of May 2026 it is still preview, still free, and still actively reshaping itself in ways that are worth knowing about before you commit a workflow to it.
The shape of the product, in plain terms:
- It's a forked VS Code. The editor surface, the extension model, the keybindings, the file tree, the terminal: all the things you already know from VS Code, Cursor, or Windsurf. If you've used any of those, the muscle memory transfers within minutes.
- It has two top-level surfaces, not one. The Manager view is a dashboard of agents in flight; the Editor view is the familiar IDE chat. They are the same product but quite different work environments. Which one you live in determines what kind of work you do.
- It is multi-agent first. The Manager view assumes you will dispatch several tasks in parallel and supervise them, the way a tech lead supervises a small team. The Editor view assumes you will work one thread at a time the way you do in Cursor.
- It is heavily Gemini-flavored, but not Gemini-only. Gemini 3 Pro is the default for high-effort agentic work; Claude Sonnet 4.5 is selectable, along with a small set of other frontier models. The model picker matters more than it does in most IDEs.
- It produces Artifacts, not just diffs. Agents return structured outputs (task lists, screenshots, browser walkthroughs, test results) that you review without re-running anything yourself. This is the part that genuinely changes the review loop.
- It can drive a browser. A native Chrome integration lets the agent click around a deployed site, capture screenshots, and run smoke tests. This is the differentiator versus Cursor and Claude Code, both of which see code but not the rendered page.
The thing to internalize before opening it for the first time: Antigravity is not "Cursor with a Google logo." It is a deliberate bet that the next IDE generation belongs to people who supervise several agents at once. Whether that bet pays off depends on whether you find a Kanban-style dashboard a natural fit for code, or a category error. Both reactions are reasonable, and you'll know which camp you're in within an hour.
Where it sits in the agentic-IDE category
The category itself is barely two years old. The lineage, in the order it arrived:
- Cursor (2023): the first VS Code fork to bet on AI as the primary feature, not an extension. Single agent, editor-shaped, very polished.
- Windsurf (2024): another VS Code fork, slightly different agent harness, similar single-agent posture.
- Claude Code (2025): Anthropic's terminal-native answer. Not an IDE at all. A CLI agent that runs alongside whatever editor you already use.
- Antigravity (November 2025): Google's first attempt. The first to ship a native multi-agent dashboard as a top-level surface, not as an experimental feature.
What this means in practice: every other tool in the category assumes one agent at a time. Antigravity is the first to take the opposite stance and build the product around it. The risk Google is taking is that "one agent at a time" might be the right answer and the multi-agent posture might be a solution looking for a problem. The opportunity is that if multi-agent supervision is what comes next, the product that gets the surface right early has a real lead.
Installing and getting in
Installation is a single download from antigravity.google: native installers for macOS, Windows, and Linux. Sign-in is a Google account; preview access is open without a waitlist as of May 2026. The first launch walks you through importing your VS Code keybindings and extensions (most extensions work; Cursor-specific ones don't). Total time from download to dispatching a first task is about ten minutes if you already have a Google account ready.
Two settings to change immediately after install. Turn on the per-task scope confirmation in Settings → Agents → Defaults: it asks you to confirm file and network scope each time you brief a task, which feels like friction for the first day and saves you from a runaway dispatch by the second. And under Settings → Browser, set the default viewport to match the device class your users actually use. The factory default is desktop; if you ship to mobile, that default is wrong.
Manager view vs Editor view
The single most important thing to understand about Antigravity is which view you should be in for the work in front of you. The product opens in Manager view by default, and that default frames the whole experience. Here is what each surface actually is:
Manager view
- Surface: a Kanban-shaped board across the whole window. Columns for queued, running, awaiting review, complete. Each card is one agent task.
- Posture: you are dispatching and supervising. You write a task brief, attach context (files, repo paths, MCP servers, browser scope), and the agent runs without you watching.
- Best for: small-to-medium tasks you can describe well in a paragraph. Triaging an issue list. Running parallel exploration on a refactor. Anything where the work is "clear enough to brief and bounded enough to review."
- What you give up: you can't watch the agent reason in real time without clicking into a card. The temptation is to dispatch too many at once and review badly.
Editor view
- Surface: a familiar IDE. File tree, editor pane, terminal, chat sidebar. Looks and behaves much like Cursor or Windsurf.
- Posture: you and one agent, in conversation, working on one thing. You watch the diffs land. You approve commands as they come. Standard pair-programming ergonomics.
- Best for: exploratory work, debugging, anything where you don't yet know what "done" looks like. Also any task where you want to learn from the agent's process, not just the result.
- What you give up: parallelism. While you're working in the Editor view on one task, you're not dispatching others. This is fine, and matches how most IDEs work, but it means the Manager view's whole pitch is dormant.
The mental model that helped me most: Manager view is for work you can describe; Editor view is for work you have to discover. If you can write the task as a paragraph and know what success looks like, dispatch it. If you'd have to write three paragraphs and you'd still rewrite half of them after seeing the first attempt, stay in the editor.
Switching between the two surfaces
The product does not force a choice. You can dispatch a Manager-view task, then immediately open the Editor view to do unrelated work, and the two run in parallel. When the Manager-view task finishes, you get a notification in the corner of the editor; clicking it takes you to the artifact. This bidirectional flow is the part that took me longest to internalize, and the part that pays off most.
The honest accounting: I spend roughly 60% of my Antigravity time in the Editor view, 30% in Manager view dispatching and reviewing, and 10% in the seam between them (writing a brief, deciding whether the task is well enough scoped to dispatch). The 10% is where most of the leverage hides. A good brief saves you twenty minutes of editor-view back-and-forth; a bad brief produces an artifact you have to throw away.
Models and how they're chosen
Antigravity is a multi-model product, but it has a default and the default matters. As of May 2026:
- Gemini 3 Pro is the default for agentic work in Manager view. It is what runs when you don't pick anything. Google has tuned the agent harness around this model, and most of the polish (tool use, browser control, artifact generation) is best with it.
- Claude Sonnet 4.5 is selectable per-task and is the strong second choice for code-heavy agentic work. In my use, it produces tighter code on refactors and is more conservative about touching unrelated files. Tool use through Antigravity's harness is reliable but slightly less integrated than with Gemini.
- A handful of other frontier models (the lineup shifts; check the picker) are available for specific tasks. They tend to be appropriate for narrow uses: a faster cheaper model for trivial edits, a longer-context option for large-repo summarization.
The model picker lives in two places. In Manager view, it's per-task: each card has a model attached and you choose it when you brief the task. In Editor view, it's per-conversation, the way Cursor's picker works. There is no automatic routing across models in either view; you pick or you take the default.
Where the model choice changes outcomes
Three categories of work where the model picker matters more than the marketing suggests:
- Long browser walkthroughs. Gemini 3 Pro is currently better at multi-step navigation through a deployed UI: it loses track of state less often and produces cleaner screenshot sequences. If the task involves "go look at the page and tell me what's wrong," start with Gemini.
- Cross-file refactors in unfamiliar code. Sonnet 4.5 is currently better at staying inside the lines of what you asked for. Gemini sometimes "improves" things you didn't ask it to touch, which is wonderful when right and exhausting to review when wrong.
- Producing tests that match an existing style. This goes either way and depends on the codebase. Run the same task on both and pick the output that looks more like what's already in the repo.
What I've stopped doing: trying to pick the "right" model up front. The cost of running a Manager-view task twice (once on each model) is small enough that for anything important, doing both and comparing is faster than agonizing over the choice. The artifact-bundle structure makes the comparison cheap: scan two task lists, glance at two diffs, choose the one that looks more like what you'd have written. Treat the model picker as a draw-two card, not a single bet.
Artifacts
Artifacts are the part of Antigravity that genuinely changes how you review agent work. Every Manager-view task produces a structured set of outputs that you can read without re-running anything: a task list with each step the agent took, screenshots of any browser or UI state it touched, results of any tests it ran, and the diff itself. The diff is no longer the only thing you review; it's one artifact among several.
The four artifact types you'll meet most:
- Task lists. The agent writes its plan as it works, ticking off steps and noting where it backtracked. This is the closest thing the product has to "reasoning traces you can actually use," and it's how you spot tasks that succeeded for the wrong reason.
- Screenshots. Anything the agent looked at in a browser, an in-IDE preview, or an external tool gets captured as a screenshot attached to the task. You scrub through them like a film strip.
- Browser walkthroughs. A more structured form of screenshots: a recorded sequence of clicks and navigations with timestamps. Useful for verifying that an agent that "fixed the login flow" actually reached the post-login page.
- Test results. If the agent ran tests (and you can configure tasks to require it), the results are attached. Pass/fail counts, stack traces, and the command line that produced them.
The reason this matters: in single-agent IDEs, the only durable record of what the agent did is the diff plus the chat transcript. The diff tells you the destination and the transcript tells you the journey, but neither tells you whether the agent verified its work. Artifacts close that gap. If the task says "fix the broken settings page," you can look at the screenshot of the settings page after the fix and decide whether it actually got fixed without checking out the branch.
Artifacts are also where the multi-agent posture starts to pay off. When you have four agents working in parallel, you cannot watch four chat transcripts simultaneously. You can scan four artifact bundles in the time it takes to read one transcript, because each bundle is structured the same way. This is the Manager view's actual loop: dispatch, wait, scan artifacts, approve or send back.
How to actually read an artifact
The order I've settled on, after enough false starts to have an opinion:
- Read the task list first, not the diff. The diff tells you what changed; the task list tells you whether the agent understood the problem. If the task list says "step 3: rewrote the entire authentication module" and you asked for a one-line bug fix, stop there.
- Look at the screenshots before the code. If the task involved a UI, the screenshots tell you whether the agent saw what you'd see. Surprisingly often, the agent fixed the wrong thing because it was looking at the wrong page.
- Check the test results next. If tests were supposed to run and didn't, that's its own signal. If they ran and failed, the artifact is telling you the work isn't done.
- Then read the diff, with the context of everything above. By this point you know whether to read it skeptically or trustingly. That priors-loaded review is faster than a cold read.
Browser control and MCP
Antigravity ships with a native Chrome integration. The agent can open a browser, navigate to URLs, click elements, fill forms, capture screenshots, read the rendered DOM, and run JavaScript in the page context. This is meaningfully different from "the agent has a search tool" or "the agent can fetch a URL." It is closer to a junior engineer who can actually look at the deployed site.
What that enables, in practice:
- Visual verification. After making a CSS change, the agent can load the page and confirm that the layout looks right. It will not catch every regression a designer would, but it will catch broken pages.
- End-to-end smoke tests without a framework. "Log in, create a record, delete it, log out" can be expressed as a brief and run as a Manager-view task without writing Playwright. The walkthrough artifact is your record of what happened.
- Bug reproduction from a description. "Users say checkout breaks if you have two cards saved" can be tried directly against staging. The agent reproduces, captures screenshots, and writes up what it saw. Often faster than getting the user on a call.
- Side-by-side comparisons. Open the same page on two URLs (production and a preview deployment, say) and ask the agent what's different. Useful for catching the visual regressions a diff doesn't surface.
Alongside the browser, Antigravity supports MCP (Model Context Protocol) servers the same way Claude Code and a growing set of other clients do. You attach servers in settings, and tasks can use whatever capabilities those servers expose. Database query servers, GitHub servers, internal tooling, the same ecosystem.
One specifically useful pattern: a database read-only MCP server attached to a Manager-view task with a tight scope. The agent can investigate "why did this customer's order fail" against real data, capture the relevant rows, and produce an artifact summarizing what it found. You get a written investigation, scoped to the question, without having to drop into a SQL prompt yourself. The browser and the database together cover most of what a junior debugger would do on a real ticket.
What it can't do (yet)
The browser integration has real edges as of May 2026. It cannot drive a mobile browser; everything renders at desktop viewport unless you set the viewport in the task brief. It does not handle multi-tab workflows well: opening a second tab sometimes loses focus on the first, and the agent has to re-orient. Authentication that requires a hardware key (WebAuthn, Passkeys with a separate device) blocks the agent the way it would block a Selenium test. For sites that depend on those, plan to pre-authenticate a session and pass cookies in, the way you would for any browser automation.
The sandbox and approval model
Because the agent can touch files, run terminal commands, control a browser, and call MCP tools, the question of "what is it allowed to do without asking" matters a lot. Antigravity's posture, as of May 2026:
- Per-surface defaults. The Manager view defaults to higher autonomy (file edits and most terminal commands run without prompting). The Editor view defaults to more confirmation prompts. You can adjust both, but the gap is intentional: Manager view assumes you're not watching, so it asks less.
- Configurable scopes. Each task can have its file scope (this directory only, this repo only), its network scope (allowed domains for the browser), and its tool scope (which MCP servers and which terminal commands) restricted. The defaults are sensible; the explicit scopes are worth setting on anything that touches production.
- Approval gates. You can require explicit approval for certain command patterns (anything matching
git push, anything touching files outside the workspace, anything calling a specific MCP tool). These are checked before the action runs, regardless of which surface you're in.
Practical workflows
Six workflows that have held up across several months of preview. They lean on what the product is good at: multi-agent dispatch, browser control, structured artifacts.
Dispatch a small refactor while you keep working.
From Manager view, brief a small, well-scoped refactor (rename this function across the repo, extract this duplicated block, convert these tests to the newer fixture style). Set a tight file scope. Switch to Editor view and continue your real work. When the artifact appears, scan the task list and the diff. Approve, send back, or close. The point is parallelism: the refactor finishes while you do the harder work yourself.
Multi-agent triage of an issue list.
Take a list of five to ten small bugs from your tracker. Dispatch one Manager-view task per bug, each with the issue text as the brief and the relevant directory as the scope. Walk away for an hour. Come back to a column of artifacts. Some will be ready to merge, some will need a follow-up brief, some will reveal that the bug was misunderstood. This is the workflow that pays back the Manager view's existence.
Browser-driven smoke test against a preview deploy.
After every PR builds a preview URL, dispatch a Manager-view task: "open [URL], log in as the test user, walk through the core flows (create, edit, delete one record of each type), and report anything that broke." The walkthrough artifact is your record. This is faster than running Playwright locally and catches the kind of breakage that doesn't show up in unit tests.
"Go look at the deployed site and tell me what's wrong."
The vaguest possible brief, and one of the most useful. Point an agent at production (or staging) and ask it to find anything broken, slow, ugly, or unexpected. You'll get a screenshot-heavy artifact with a list of observations. Most will be noise. One or two will be real. Worth running once a week on anything you ship to users.
Migration scaffolding.
For larger migrations (framework upgrade, library swap, language version bump), use the Editor view to do the first one or two files yourself, then dispatch the rest as a Manager-view task with your converted files as examples in the brief. The agent extends a pattern much better than it invents one. Review each artifact carefully; this is the workflow most likely to produce subtly wrong code that compiles.
Code review from a known-good branch.
Brief a Manager-view task with the diff between two branches and ask for a structured review: bugs, style issues, missing tests, unclear naming. The artifact you get back is a written review you can paste into the PR comments after editing. Worth doing as a second pass after your own review, not a replacement for it. Disagreements between you and the agent are usually the most useful signal in the artifact.
Limits and pitfalls
Antigravity is genuinely interesting and not yet finished. The places it currently disappoints, in roughly the order you'll meet them.
- Preview reliability. Tasks occasionally hang in the running state and never produce an artifact. The fix is usually to cancel and re-dispatch. The Manager view does not always surface this clearly; if a task has been running for 20 minutes on something that should take 2, suspect a hang and intervene.
- Manager-view ergonomics are opinionated. The Kanban metaphor works for some kinds of work and feels actively wrong for others. If you spend most of your day in deep, exploratory code that doesn't decompose into briefable tasks, the Manager view is dead weight. That isn't a bug in the product; it's a fit question, and not everyone fits.
- The single-cursor mental model dies hard. The first week feels off because the IDE expects you to think differently about your work: as a queue of dispatchable tasks, not as a stream of edits. People who arrive expecting "Cursor with extra features" tend to bounce. People who arrive expecting "a tool for supervising several juniors" tend to stick.
- Latency on Manager-view tasks is real. Even small tasks take a minute or two before the agent starts producing visible work, because each task spins up its own agent context. This is fine when you're dispatching parallel work; it's frustrating if you wanted instant feedback.
- The browser integration is Chrome-flavored. Sites that behave differently across browsers (older enterprise apps especially) may render correctly in Antigravity's browser and badly in your users' browsers. Don't treat the agent's screenshots as cross-browser evidence.
- Cost is unknown. Free during preview, with no announced pricing as of May 2026. Whatever you build into your workflow should also work without it, because the eventual price could land anywhere from "Cursor-comparable" to "enterprise-only." Worth keeping a fallback path.
- Surface area to learn is bigger. There are simply more features here than in Claude Code or Cursor. The two-view product, the artifact system, the per-task scopes, the model picker, the MCP attachments. None of it is hard, but all of it takes time. Budget a week of moderate use before judging the product.
- Workspace and account context. Antigravity uses a Google account for sign-in and (currently) ties some features to specific account tiers. If you're signed into multiple Google accounts in your browser, the IDE can pick the wrong one and you'll get strange permission errors. Sign in explicitly with the account you intend to use.
The temptation to over-dispatch
The single most common failure mode I've seen (in myself and in others) is over-dispatching. The Manager view makes it trivially easy to queue ten tasks before lunch, walk away, and return to a column of artifacts you don't have the attention to review properly. The result is either approving work you should have sent back, or sending back work that was actually fine because you didn't read carefully enough to tell.
The fix is a self-imposed cap. Two or three tasks running in parallel is the sweet spot for most work; five is the upper limit for anything you intend to actually review; ten is a sign you should have done the work yourself in the Editor view. The product happily lets you go past those limits. Don't.
Antigravity vs Claude Code vs Cursor
The honest comparison, kept short. All three are good products. They are not interchangeable.
Antigravity
- Posture: GUI-first, multi-agent, dashboard-shaped.
- Strengths: parallel task dispatch, browser control, structured artifacts, multi-model picker.
- Weaknesses: preview reliability, surface area, opinionated about how you work, unknown pricing.
- Use when: you have several discrete tasks you can brief well and you want to supervise them rather than do them.
Claude Code
- Posture: terminal-native, single-thread, minimalist.
- Strengths: deep code reasoning, conservative edits, scriptable, integrates with whatever editor you already use.
- Weaknesses: no GUI, no native browser control, no parallel-task dashboard, all the affordances are text.
- Use when: the work is mostly code, you already live in a terminal, and you want minimum ceremony per task. See the Claude Code guide for more.
Cursor
- Posture: GUI-first, single-agent, editor-shaped.
- Strengths: best pure pair-programming surface; fast, polished, mature; the muscle memory matches VS Code exactly.
- Weaknesses: one agent at a time, no native browser control, multi-agent workflows have to be faked with multiple windows.
- Use when: you want one capable agent inside the editor you already trust, and you don't need the multi-agent dashboard.
Use Antigravity when the work is many small things you can brief. Use Claude Code when the work is one hard thing you have to think through. Use Cursor when the work is the editor itself. — TWD
The other thing worth saying out loud: these tools are not exclusive. I use all three in the same week, often in the same day. Antigravity for batch dispatch and browser-driven verification. Claude Code for tight-loop work in big repos where I want a terminal and a model that stays out of my way. Cursor for the conventional pair-programming sessions where I want one strong agent in a familiar editor. The choice is per-task, not per-quarter.
A short note on team adoption
If you are evaluating Antigravity for a team rather than for yourself, two things to test before committing. First, give two engineers the same week-long set of tickets and have one work primarily in Manager view, the other primarily in Editor view. Compare what shipped, what didn't, and how each one felt about the experience. The Kanban-shaped surface produces strong reactions in both directions; the team mix matters. Second, audit the artifact bundles produced over a week. If most of them are being skim-approved without real review, the multi-agent posture is creating risk faster than it is creating leverage, and the team should pull back to one or two parallel tasks until review discipline catches up.
The product rewards teams that already practice careful PR review and have the discipline to refuse a merge when a brief was misunderstood. It punishes teams that rubber-stamp. Both directions amplify. Know which one you are before you scale up.
One closing observation
The Manager view is the genuinely new idea in this category. Whether or not Antigravity itself wins, the idea that the IDE should let you supervise several agents instead of partner with one is going to spread. Cursor will likely add something similar. So will Claude Code, eventually, in a terminal-shaped form. The question Antigravity is asking is the right question: what does an IDE look like when the bottleneck is your attention, not the agent's capability? You should try the answer Google ships, even if it isn't the one you stick with.
The honest summary, after several months of preview use: Antigravity is the first IDE that has made me think differently about how I structure my own day. The first hour, I was annoyed at having to learn a new dashboard. By the end of the first week, I was queuing tasks before lunch in a way I hadn't done before, and noticing that some of the work I used to do myself was better dispatched. Whether that is leverage or distraction depends on the task. Both are real. The tool is worth installing for the question it forces you to answer about your own workflow, even if the answer turns out to be that you preferred how you worked before.
If any of this is out of date by the time you read it: blog.google/technology/google-labs is where Antigravity announcements land, and the in-product changelog (Help → What's new) is more current than the blog. Both lag the actual rollouts by a few days, and preview behavior shifts week to week.