All Articles
This post was handcrafted in German and translated to English using AI. For the original, see the German version.
AWS Lambda MicroVMs: A Hands-On Test
9 min read

AWS Lambda MicroVMs: A Hands-On Test

Nicolai Lang

Nicolai Lang

I help teams ship reliable, cost-efficient cloud systems on AWS.

On 2026-06-22, AWS launched Lambda MicroVMs. Isolated compute instances on the Firecracker platform that Lambda itself is built on. But this isn't just Lambda with a longer runtime; it's real VMs that can run for up to 8h, are stateful, and can handle more than one request in parallel. According to AWS, typical use cases are developer or AI-agent sandboxes. At launch, AWS Lambda MicroVMs are only available in a limited set of regions: us-east-1, us-east-2, us-west-2, ap-northeast-1, and eu-west-1.

Key Facts

  • VMs are based on an image snapshot created up front
  • Baseline runtime costs are calculated from a memory and CPU baseline
  • The VM "scales" to four times the baseline (with its own burst pricing)
  • Baselines range from 0.5 GB RAM / 0.25 vCPU up to 8 GB RAM / 4 vCPU; on the largest baseline you get a solid 32 GB RAM / 16 vCPU
  • Depending on the sizing, you get 8, 16, or 32 GB of disk space
  • Costs are essentially made up of memory (and vCPUs) and storage
  • You pay at least the chosen baseline (RAM + CPU) for as long as the VM runs, even while idle. Under load, burst costs are added on top
  • Compute is billed by the second
  • Snapshots (VM images) are billed per GB/month, with a one-week minimum
  • Suspend/resume incur a write/read for the memory dump
  • On suspend, a memory and disk snapshot is stored, which also costs storage
  • While suspended, there are no compute costs, only storage
  • A VM has a maximum lifetime of 8h, during which it keeps its state and can be suspended and resumed as often as you like, after which it is hard-terminated
  • It starts instantly and gives you an HTTPS endpoint with its own token mechanism for auth
  • Lifecycle policies can auto-suspend the VMs, and an automatic resume on incoming traffic is possible too, and very fast

A Quick Tour of How It Works

VM Images

First, you build a MicroVM image from a Dockerfile and a source.zip. To do that, Lambda spins up the container in the background and takes a snapshot of the running application in its initialized state. The actual VMs are then created from that snapshot. So you always get a "warm" container, if you will. Building an image took about two to three minutes in my tests, and you can follow the process in CloudWatch Logs. The settings you pass to the image are memory, environment variables, which lifecycle hooks fire and what timeout they should have, an optional network connector to attach to a VPC, an IAM role for the build, and the CloudWatch logging configuration. As you'd expect from Lambda, the sizing hangs off the memory setting. Images are versioned automatically (and, as the version count grows, they cost storage accordingly).

MicroVMs

Now that we have an image, we can start a VM from it. Here you pick the base image and version, and you can pass an execution role and a few more settings such as a lifecycle policy, a log group, and a custom payload for the run hook. You also choose whether ingress allows only HTTP or also WebSockets for shell access. For egress, you can pick the internet or a network connector for VPCs, and that's it. The VM starts, and for HTTP we get a URL in the format https://<ID>.lambda-microvm.<REGION>.on.aws to talk to it. To do that, we have to create an access token that we then send in an X-Aws-Proxy-Auth header, with a configurable validity of up to 60 minutes. Response times for me were roughly 35ms for a simple "Hello world!" endpoint. After a suspend with auto-resume enabled, the first request to the suspended VM took around 500ms, so a very fast and barely noticeable resume.

Hands-On Test

I tested all of this with a small Node.js server: Hello World, plus a small /items API that writes to and reads from DynamoDB. As the base image I used the AWS-recommended microvms:al2023-minimal and installed a few handy tools like htop and fio on top. After the first start, connecting to a terminal inside the VM (built into the AWS console), and running htop, it immediately shows that it's not the baseline but, in my case, the full 4 vCPUs and 8 GB of memory that are present. So there's no scaling happening in that sense; it's simply a billing mode. If you stay below baseline usage, you pay that as a fixed minimum, and anything above it is billed dynamically.

What Throughput Does the API Deliver?

The proxy that AWS puts in front of the MicroVM has fixed rate limits built in. According to the docs, those are 40 RPS at 4 vCPU / 8 GB and 160 RPS at 16 vCPU / 32 GB.

Why the burst values are quoted here is a bit of a mystery. My own tests with a small k6 script couldn't really verify that either, and I never got near 40 requests/s. Even at 35 rps there were still occasional errors; at 30 rps it ran cleanly.

HTTP
http_req_duration..............: avg=40.82ms  min=35.13ms  med=38.65ms  max=127.65ms p(90)=44.29ms  p(95)=45.15ms
  { expected_response:true }...: avg=40.82ms  min=35.13ms  med=38.65ms  max=127.65ms p(90)=44.29ms  p(95)=45.15ms
http_req_failed................: 0.00%  0 out of 1804
http_reqs......................: 1804   29.980445/s

Response times are in a reasonable range (measured from Germany over the public internet), and a single instance comfortably delivers enough throughput to run a small API and fire the odd test at it. What it is not, and what AWS doesn't intend it to be either, is an easier replacement for ECS Fargate to run a production API.

Snapshots, Randomness, and the Clock

As we already know, when a MicroVM starts, the platform restores a snapshot it took during the image build. The entire memory thaws back out, including PRNG state. For Math.random() that means: the seed lives in the snapshot, and every VM from the same snapshot rolls the same sequence.

I tested this with a small /rand API. At first it confused me, because different values came back. The reason: AWS currently builds two snapshot images, one each for Graviton 3 and Graviton 4. You can see this in the build logs in CloudWatch as well. Each snapshot naturally gets its own seed. But if you start two VMs from the same snapshot (that is, the same Graviton generation), the sequences are identical. By the way, you can tell from the bootId in the run-hook logs which snapshot a VM came from.

crypto.randomBytes() behaves differently, and that also depends on the base image. Node pulls the bytes via OpenSSL, and the al2023 image ships an OpenSSL that reseeds on resume (Firecracker also reseeds the kernel RNG). That's why every VM gets its own values. On a foreign base image like node:24-slim, OpenSSL doesn't reseed on resume by itself, in which case crypto.randomBytes() may collide too, though I didn't test that. So it's important to know what you build your IDs and tokens on, and how the PRNG underneath is seeded.

The clock behaves exactly as you'd expect. Date returns the real time after a resume, so you don't have to worry about it. process.uptime() and every other monotonic clock freezes across the snapshot and suspend gap: it only counts the active runtime, not the time spent frozen. In the normal case, that's exactly what you want.

Disk

There's no documented IOPS limit for disk, so I measured it myself. The write numbers are reliable because they go through fsync: roughly 500 MiB/s sequential, and around 100,000 IOPS for random 4k. The read numbers look spectacular at around 3 GB/s, but they mostly come from the host cache and are therefore less meaningful. What I was actually after with this test, though, was to see whether you get a full-fledged, fast file system, and that's definitely the case.

The exact numbers vary from run to run; here's one example run:

Logging to CloudWatch

Logging works just like you're used to from Lambda: it goes to CloudWatch Logs via stdout with no extra effort. One thing to know: it only works if you explicitly pass the RunMicrovm call its own config via --logging. The logging property on the image only applies to the build and isn't carried over to the VM automatically.

Shell Access

If you need a shell into the VM, you have to pass SHELL_INGRESS as an ingress option at start time. Attaching to a running VM after the fact doesn't work. From there, you open a terminal session via the Connect button in the AWS console. You can of course also do it over websocat, but being able to connect directly without much fuss is super helpful. Here, too, auth runs through a token.

Quotas

Here as well, much like the 10 out of 1000 concurrent executions on Lambda, AWS once again pulled a strange one: you get (probably depending on the account) a whopping 8 GB out of the 400 GB default memory quota. That means you can't even start the larger VMs at first, and even with 4 GB of RAM only two run at the same time. So I couldn't test the larger ones. My limit increase (to 401 GB ๐Ÿ™„) is still pending and, sadly, wasn't waved straight through by an AI.

Conclusion

Spinning up a container in the cloud quickly and cheaply, one that shuts itself down when you're on a break or working on something else but wakes back up in under a second, is a pretty cool thing for development, whether you're a human or an agent. Sure, you can do that locally too, but the moment you need to be inside a VPC, it gets genuinely interesting. I don't yet have the exact use case I'd reach for it in my day-to-day work. But that doesn't matter: once you have one more tool on your belt, the day eventually comes when you need it.

You'll find the (admittedly largely vibe-coded) code for my tests as a quick start on GitHub. Have fun trying it out!

Let's talk

Whether it's a new initiative, an urgent bottleneck, or a second opinion on your current architecture. A quick conversation will show how I can help.