Affiliate disclosure: Some links in this article are affiliate links. If you sign up through them, we may earn a commission at no extra cost to you. Recommendations are based on documented platform capabilities and official pricing as of May 2026.
How to Deploy an MCP Server on Fly.io in 2026 (Step-by-Step)
Fly.io is the right platform for MCP servers when Railway’s single-region limitation becomes a constraint. If your MCP clients are distributed across geographies — a team split across Tokyo, London, and San Francisco, or a product serving users worldwide — Fly.io’s 35+ region anycast routing is the feature no other PaaS offers.
This guide covers deploying a Python or TypeScript MCP server to Fly.io with Streamable HTTP transport, persistent state, custom domain, and proper authentication. It assumes you have a working local MCP server and want it running in production.
Why Fly.io for MCP
Multi-region routing. Fly.io deploys your container to multiple datacenters simultaneously and routes each incoming connection to the nearest healthy instance. For an MCP server with a global user base, this reduces latency meaningfully — a client in Tokyo hitting a nrt region instance instead of a US West one saves 150+ ms per tool call.
Machines can stay allocated or auto-suspend. Unlike serverless platforms that cold-start on every request, Fly Machines can be configured to stay running 24/7 (matching Railway’s always-on behavior) or to suspend when no connections are active and resume in 300–500 ms. For low-traffic MCP servers, auto-suspend drops idle cost toward zero.
Persistent volumes are mature. Fly Volumes attach to a machine and survive redeploys. Unlike Railway’s volumes (which work but lack snapshot tooling), Fly volumes support snapshots and can be backed up to Fly’s Tigris object storage. For MCP servers that need to persist data between restarts, this matters.
Per-second billing. A Fly machine running 100% uptime on a shared-cpu-2x (512 MB) costs roughly $4–5/month. If your MCP server handles bursty traffic, auto-suspend drops that to near zero for idle periods.
Prerequisites
- A working MCP server in Python or TypeScript using Streamable HTTP transport (not stdio)
- Docker installed locally
- Fly CLI (
flyctl) installed:curl -L https://fly.io/install.sh | sh - A Fly.io account: [Sign up here](https://hostingpundit.com/go/fly-io) — the free tier includes 3 VMs and 3 GB storage
Step 1: Prepare your MCP server for Fly.io
Use Streamable HTTP transport
Your MCP server must use Streamable HTTP transport and bind to 0.0.0.0 on the port Fly.io assigns via $PORT.
Python (FastMCP):
import os
from mcp.server.fastmcp import FastMCP
mcp = FastMCP("my-server")
<h1>... define your tools ...</h1>
if __name__ == "__main__":
mcp.run(
transport="streamable-http",
host="0.0.0.0",
port=int(os.environ.get("PORT", 8080)),
)
TypeScript:
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import express from "express";
const app = express();
app.use(express.json());
// ... define your server and tools ...
const transport = new StreamableHTTPServerTransport({ sessionIdGenerator: undefined });
await server.connect(transport);
app.post("/mcp", (req, res) => transport.handleRequest(req, res, req.body));
app.get("/mcp", (req, res) => transport.handleRequest(req, res));
app.delete("/mcp", (req, res) => transport.handleRequest(req, res));
const port = parseInt(process.env.PORT ?? "8080");
app.listen(port, "0.0.0.0", () => {
console.log(`MCP server listening on ${port}`);
});
Add a Dockerfile
Fly.io detects Dockerfiles automatically. Create one in your project root:
Python:
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8080
CMD ["python", "server.py"]
Node.js:
FROM node:20-slim
WORKDIR /app
COPY package<em>.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
EXPOSE 8080
CMD ["node", "dist/index.js"]
Gotcha: Fly.io sets $PORT automatically (default is 8080 for HTTP services). Your Dockerfile EXPOSE and application code should both use this value consistently.
Step 2: Initialize the Fly.io app
Run flyctl launch from your project directory:
flyctl launch
flyctl will:
- Detect your Dockerfile
- Ask for an app name (or generate one)
- Ask which region to deploy to (pick the one closest to most of your users — use
fly platform regionsto list options) - Create
fly.tomlin your project directory
Important: edit the generated fly.toml before deploying. The defaults need adjustment for an MCP server:
app = "your-mcp-server"
primary_region = "nrt" # or your chosen region
[build]
# Fly auto-detects your Dockerfile
[[services]]
internal_port = 8080
protocol = "tcp"
[services.concurrency]
type = "connections"
hard_limit = 100
soft_limit = 80
[[services.ports]]
handlers = ["tls", "http"]
port = 443
[[services.ports]]
handlers = ["http"]
port = 80
force_https = true
[[services.http_checks]]
interval = 15000
timeout = 5000
grace_period = "10s"
method = "get"
path = "/health"
[env]
PORT = "8080"
Key configuration points:
internal_port = 8080must match the port your server binds to[[services.http_checks]]with path/health— add a health endpoint to your serverforce_https = true— redirect HTTP to HTTPS automatically
Step 3: Set secrets (environment variables)
Never put credentials in fly.toml or Dockerfiles. Use Fly’s secrets system:
fly secrets set MCP_AUTH_TOKEN=$(openssl rand -hex 32)
fly secrets set MY_API_KEY=your_api_key_here
fly secrets set DATABASE_URL=your_database_url
Fly injects these as environment variables at runtime. They are encrypted at rest and never appear in build logs.
To verify secrets are set (shows names but not values):
fly secrets list
Step 4: Deploy
fly deploy
flyctl will:
- Build your Docker image
- Push it to Fly’s image registry
- Deploy to your configured region(s)
- Run health checks
- Print your app’s URL on success
A successful deploy looks like:
==> Verifying app config
==> Building image
...
==> Pushing image to registry
==> Creating release
==> Monitoring deployment
Machine e784567d create started ... started
✓ Machine e784567d [app] is healthy [HTTP GET /health - 200]
==> Visit your newly deployed app at https://your-mcp-server.fly.dev
Your MCP endpoint is live at: https://your-mcp-server.fly.dev/mcp
Step 5: Add more regions (optional but powerful)
This is Fly.io’s killer feature for MCP. To deploy to additional regions:
fly regions add fra # Frankfurt
fly regions add lax # Los Angeles
fly scale count 3 # One machine per region
Fly.io’s anycast routing automatically sends each user to the nearest healthy instance. Your MCP clients don’t need to know which region they’re hitting — the DNS routing handles it transparently.
To see which machines are running and where:
fly status
Step 6: Custom domain
- Add your domain in the Fly dashboard: your-app → Certificates → Add Certificate
- Fly provides a DNS record to add at your registrar (typically an A record or CNAME)
- Fly provisions a Let’s Encrypt certificate automatically
Alternatively, via CLI:
fly certs create mcp.yourdomain.com
fly certs show mcp.yourdomain.com # Shows required DNS records
Once propagated, your MCP endpoint is: https://mcp.yourdomain.com/mcp
Step 7: Connect to Claude Code / Claude Desktop
Claude Code (.claude/settings.json or ~/.claude/settings.json):
{
"mcpServers": {
"my-server": {
"type": "http",
"url": "https://your-mcp-server.fly.dev/mcp",
"headers": {
"Authorization": "Bearer YOUR_MCP_AUTH_TOKEN"
}
}
}
}
Claude Desktop (claude_desktop_config.json):
{
"mcpServers": {
"my-server": {
"transport": {
"type": "http",
"url": "https://your-mcp-server.fly.dev/mcp"
},
"headers": {
"Authorization": "Bearer YOUR_MCP_AUTH_TOKEN"
}
}
}
}
Test without a client:
curl -X POST https://your-mcp-server.fly.dev/mcp
-H "Content-Type: application/json"
-H "Authorization: Bearer YOUR_MCP_AUTH_TOKEN"
-d '{"jsonrpc":"2.0","method":"tools/list","id":1}'
Persistent storage (when you need it)
If your MCP server needs to persist data (embeddings cache, conversation history, tool state), create a Fly Volume:
fly volumes create mcp_data --size 10 --region nrt
Mount it in fly.toml:
[mounts]
source = "mcp_data"
destination = "/data"
Your MCP server can then write to /data/ and the data persists across restarts and redeploys.
Note on multi-region and volumes: Fly Volumes are attached to a single machine in a single region. If you run machines in multiple regions and each needs persistent storage, each region’s machine gets its own volume. For shared state across regions, use an external Postgres (Fly Postgres, Supabase, Neon) or Fly’s Tigris object storage.
Cost breakdown
| Configuration | Monthly cost |
|---|---|
| Free tier (3 shared-cpu-1x, 256 MB, 1 region) | $0 |
| Single shared-cpu-2x (512 MB), 1 region, always-on | ~$4.50 |
| Single shared-cpu-2x, 1 region, auto-suspend (low traffic) | ~$0.50–2.00 |
| 3-region deployment (nrt, fra, sjc), shared-cpu-2x each | ~$13–15 |
| + 10 GB volume | +$1.50/month |
Gotcha: Fly’s free tier uses shared-cpu-1x machines with 256 MB RAM. Python MCP servers using FastMCP, LangChain, or similar libraries routinely exceed 256 MB at startup. Budget for at least a shared-cpu-2x (512 MB) if you’re running Python. Node.js MCP servers typically fit within 256 MB for simple tools.
Common gotchas
Wrong internal_port in fly.toml. If internal_port doesn’t match the port your server binds to, Fly’s health checks fail and the deploy loops indefinitely. Double-check that fly.toml‘s internal_port matches your app’s $PORT.
Auth token required. Your Fly.io app URL is publicly reachable. Without bearer token authentication, anyone can invoke your MCP tools. Set MCP_AUTH_TOKEN as a secret and validate it on every /mcp request.
Multi-region state. If you scale to multiple regions, avoid in-memory session state — different requests may hit different machines. Use Fly Volumes (per-region) or an external database for state that must be consistent across instances.
Health check grace period. If your server takes >10 seconds to start (common with large Python dependencies), Fly may kill the machine before it’s ready. Set grace_period = "30s" in your health check config.
SSE connections and Fly’s idle timeout. Fly’s load balancer closes connections idle for over 75 seconds by default. For MCP clients holding long-lived SSE connections, configure your client to send keepalive pings or increase the Fly timeout via [services.tcp_checks] settings.
Railway vs. Fly.io: when to choose each
If you’re deciding between these two platforms specifically for an MCP server:
- Single region, fast deploys, minimal config → Railway
- Multi-region, global users, per-second billing → Fly.io
- Need GPU alongside MCP tools → Railway
For a full side-by-side, see Railway vs Fly.io for AI Agents.
Prices verified May 2026. Check official docs before committing — hosting pricing changes frequently.*
Leave a Reply