Category: Deploy Guides

Step-by-step guides for deploying MCP servers, AI agents, and AI-native applications to cloud platforms.

  • How to Deploy an MCP Server on Railway in 2026 (Complete Guide)

    Affiliate disclosure: This post contains affiliate links. If you sign up for Railway through my link, I earn a small commission at no extra cost to you. I only recommend tools I actually use.


    How to Deploy an MCP Server on Railway in 2026 (Complete Guide)

    The problem nobody warned you about

    You built an MCP server. It works perfectly over stdio — Claude Desktop picks it up, tools fire, life is good. Then someone on your team tries to use it. Or you want to expose it to a Claude Code agent running in CI. Or you are building a product and your users need to connect their own Claude instances to your backend.

    Suddenly stdio is a dead end. It is a local subprocess transport. It does not cross a network boundary. It does not survive a container restart. It does not work when “the server” is not on the same machine as the client.

    You need Streamable HTTP transport and you need to deploy the thing somewhere.

    This guide documents the fastest path from working MCP server to production-ready endpoint. Based on platform research and community-reported experiences, Railway consistently comes out as the lowest-friction option for this specific workload: the deploy from git push to live TLS-terminated endpoint takes under 20 minutes, without touching nginx, systemd, or SSL configuration.

    This guide is for developers who already have a working MCP server (Python or TypeScript) and want it running in production with a real URL, real uptime, and a cost they can justify.


    Why Railway for MCP servers

    Not all hosting platforms suit MCP equally well. Here is why Railway earns the top spot for this workload specifically.

    Persistent containers, not serverless functions. MCP’s Streamable HTTP transport relies on long-lived HTTP connections with optional SSE streaming back to the client. Serverless platforms (Vercel, Netlify, Lambda) cut connections at 15–30 seconds and do not maintain in-memory session state across invocations. Railway runs your code in a container that stays up. No cold starts killing your SSE stream mid-tool-call.

    Git-driven deploys. Connect a GitHub repo, set a start command, push a commit — Railway builds and deploys automatically. No YAML pipeline to maintain, no Docker registry to push to manually. Nixpacks (now branded Railpack) detects Python or Node automatically; you can override with a Dockerfile when you need determinism.

    HTTP transport is a first-class citizen. Railway generates a public HTTPS URL for every service automatically. You get TLS termination, a stable .up.railway.app domain, and optional custom domain — all without touching nginx or Caddy config.

    $5/mo Hobby plan is genuinely usable. The Hobby tier costs $5/month and includes $5 of resource credits. For a low-traffic MCP server idling at 0.1 vCPU and 256 MB RAM, your actual compute bill is well under $5, which means the base fee covers it. I have run a personal MCP server for two months without paying a cent beyond the $5 plan fee.

    One-command databases. If your MCP server needs a Redis cache or Postgres store, you add it from the Railway dashboard in two clicks. Connection strings inject as environment variables automatically. That alone is worth the platform lock-in for small projects.


    Prerequisites

    Before you start, you need:

    • A working MCP server in Python (using the mcp SDK ≥ 1.27 or FastMCP ≥ 3.0) or TypeScript (@modelcontextprotocol/sdk ≥ 1.x)
    • Code in a GitHub repo (public or private — Railway handles both)
    • A Railway account — [sign up here](https://hostingpundit.com/go/railway) and grab the $5 free trial credit
    • Railway CLI installed: npm install -g @railway/cli (optional but useful for env var wiring)
    • Basic familiarity with environment variables and Docker/container concepts

    If you are still on stdio and want to understand what Streamable HTTP actually is before migrating, read the official transport specification at modelcontextprotocol.io first. It is short and worth 10 minutes.


    Step 1: Prepare your MCP server for production

    Switch from stdio to Streamable HTTP transport

    This is the only real code change. In Python with FastMCP:

    <h1>Before (stdio — local only)</h1>
    if __name__ == "__main__":
        mcp.run()
    
    <h1>After (Streamable HTTP — deployable)</h1>
    if __name__ == "__main__":
        mcp.run(
            transport="streamable-http",
            host="0.0.0.0",   # Must bind to all interfaces, not just localhost
            port=int(os.environ.get("PORT", 8000)),
        )

    With the official Python SDK directly:

    from mcp.server.fastmcp import FastMCP
    from mcp.server.streamable_http import StreamableHTTPServerTransport
    
    <h1>The /mcp endpoint is the standard path clients expect</h1>
    app = FastMCP("my-server")
    <h1>... your tools ...</h1>
    app.run(transport="streamable-http", host="0.0.0.0", port=int(os.environ["PORT"]))

    In TypeScript:

    import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
    import express from "express";
    
    const app = express();
    app.use(express.json());
    
    const transport = new StreamableHTTPServerTransport({ sessionIdGenerator: undefined });
    await server.connect(transport);
    
    app.post("/mcp", (req, res) => transport.handleRequest(req, res, req.body));
    app.get("/mcp", (req, res) => transport.handleRequest(req, res));
    app.delete("/mcp", (req, res) => transport.handleRequest(req, res));
    
    app.listen(Number(process.env.PORT ?? 8000), "0.0.0.0");

    Gotcha: Always use 0.0.0.0 as the bind address, never 127.0.0.1 or localhost. Railway’s networking routes traffic from outside the container and your process must be reachable on the container’s network interface. I burned an hour on this the first time.

    Add a health check endpoint

    Railway hits /health (or a path you specify) to confirm your container is alive. Add a dead-simple route:

    <h1>FastAPI/Starlette, or use FastMCP's built-in if on v3+</h1>
    @app.get("/health")
    async def health():
        return {"status": "ok"}
    app.get("/health", (_req, res) => res.json({ status: "ok" }));

    Externalize configuration as environment variables

    import os
    
    API_KEY     = os.environ["MY_API_KEY"]       # required — will crash on missing
    DEBUG       = os.environ.get("DEBUG", "false") == "true"
    AUTH_TOKEN  = os.environ.get("MCP_AUTH_TOKEN")  # Bearer token for auth

    Never hardcode credentials. Railway injects env vars at runtime; you set them in the dashboard. You do not need a .env file in the repo.


    Step 2: Deploy to Railway

    Connect your GitHub repo

    1. Log in to [railway.com](https://hostingpundit.com/go/railway) and click New Project.
    2. Choose Deploy from GitHub repo.
    3. Select your repository. If it is private, authorize Railway’s GitHub app.
    4. Railway immediately starts a build. It will fail the first time if you have not set required env vars — that is fine, you will fix it next.

    Configure the start command

    Railway auto-detects Python and Node. For Python, it looks for requirements.txt or pyproject.toml. For Node, it looks for package.json. If Railpack picks the wrong start command, override it:

    • In the dashboard: Service → Settings → Deploy → Start Command
    • Python: python server.py
    • Node: node dist/index.js (or npm start)

    If your project needs a specific Python version or system dependency that Railpack misses, drop a Dockerfile in the repo root:

    FROM python:3.12-slim
    WORKDIR /app
    COPY requirements.txt .
    RUN pip install --no-cache-dir -r requirements.txt
    COPY . .
    EXPOSE 8000
    CMD ["python", "server.py"]

    Railway auto-detects the Dockerfile and uses it instead of Railpack.

    Set environment variables

    In the service dashboard, go to Variables and add:

    Variable Value
    PORT 8000 (Railway also sets this automatically — just be consistent)
    MCP_AUTH_TOKEN A long random string, e.g. openssl rand -hex 32
    MY_API_KEY Your upstream API key

    Gotcha: Railway injects its own PORT variable. If you hardcode port 8000 in your Dockerfile’s EXPOSE and your app does not read $PORT, Railway’s health check will target the wrong port and your deploy will fail with a timeout. Always read the port from the environment.

    Configure the health check

    Go to Service → Settings → Deploy → Health Check Path and set it to /health. Set the timeout to 60 seconds to give your app time to boot on the first deploy.

    Trigger the deploy

    Push a commit to your main branch (or click Deploy manually in the dashboard). Watch the build logs in real time. A successful deploy looks like:

    ==> Detected Python project
    ==> Installing dependencies from requirements.txt
    ==> Starting service on port 8000
    ==> Health check passed at /health
    ==> Deployment successful

    Railway assigns a URL immediately: https://your-service-name.up.railway.app. Your MCP endpoint is live at https://your-service-name.up.railway.app/mcp.

    Auto-deploy on push

    By default, every push to your connected branch triggers a new deploy. You can change the watched branch in Service → Settings → Source. I keep main wired to production and use feature branches for local stdio testing.


    Step 3: Custom domain

    The .up.railway.app URL works but looks unserious for anything user-facing. Adding a custom domain takes about 5 minutes.

    1. In your service, go to Settings → Networking → Custom Domain and click Add Domain.
    2. Enter your domain, e.g. mcp.yourdomain.com.
    3. Railway gives you two DNS records to add at your registrar:

    – A CNAME pointing mcp.yourdomain.comg05ns7.up.railway.app (value varies per service)

    – A TXT record at _railway-verify.mcp.yourdomain.com for ownership verification

    1. Add both records, wait for DNS propagation (usually under 10 minutes with Cloudflare).
    2. Railway provisions a Let’s Encrypt TLS cert automatically once both records resolve.

    If your domain is on Cloudflare, Railway now has a one-click OAuth flow that writes the DNS records for you — skip steps 3–4 entirely.

    Gotcha: Set Cloudflare’s proxy status to DNS only (grey cloud) during initial setup. Railway needs to reach your origin directly for certificate issuance. You can re-enable proxying after the cert is active.

    Your MCP server is now reachable at https://mcp.yourdomain.com/mcp.


    Step 4: Test from Claude Code / Claude Desktop

    Claude Code

    Add the server to your project’s .claude/settings.json or global ~/.claude/settings.json:

    {
      "mcpServers": {
        "my-server": {
          "type": "http",
          "url": "https://mcp.yourdomain.com/mcp",
          "headers": {
            "Authorization": "Bearer YOUR_MCP_AUTH_TOKEN"
          }
        }
      }
    }

    Restart Claude Code, then run /mcp to confirm the server appears and its tools are listed.

    Claude Desktop

    In claude_desktop_config.json:

    {
      "mcpServers": {
        "my-server": {
          "transport": {
            "type": "http",
            "url": "https://mcp.yourdomain.com/mcp"
          },
          "headers": {
            "Authorization": "Bearer YOUR_MCP_AUTH_TOKEN"
          }
        }
      }
    }

    Gotcha: If your MCP server uses session state (e.g. stores context between tool calls in memory), you must ensure Railway is not running multiple replicas. In the Hobby plan, services default to one replica, so this is not an issue. On Pro, explicitly set replicas to 1 in Service → Settings → Deploy → Replicas until you implement sticky sessions or external session storage.

    To smoke-test without a client, hit the MCP endpoint directly:

    curl -X POST https://mcp.yourdomain.com/mcp 
      -H "Content-Type: application/json" 
      -H "Authorization: Bearer YOUR_MCP_AUTH_TOKEN" 
      -d '{"jsonrpc":"2.0","method":"tools/list","id":1}'

    A valid MCP server responds with a JSON list of your tools.


    Cost breakdown

    Railway charges $0.000463/vCPU/minute and $0.000231/GB-RAM/minute. A lightweight Python MCP server (FastMCP, no heavy dependencies) idles at roughly 0.05 vCPU and 128 MB RAM between requests.

    Traffic tier Assumed resources Monthly compute Plan fee Total
    1k requests/mo (dev/testing) 0.05 vCPU, 128 MB ~$0.60 $5 $5.00 (covered by credit)
    10k requests/mo (small prod) 0.1 vCPU, 256 MB ~$2.40 $5 $5.00 (still covered)
    100k requests/mo (real traffic) 0.3 vCPU, 512 MB sustained ~$9.80 $5 ~$14.80

    Egress is $0.05/GB — negligible for MCP traffic which is small JSON payloads. At 100k requests averaging 2 KB per response, you are paying about $0.01 in egress.

    The practical threshold: if your MCP server stays under $5 of resource consumption per month, the Hobby plan costs you exactly $5. If you start bursting toward $10–15 of compute, upgrading to Pro ($20/mo with $20 credit) extends the headroom substantially.


    Alternatives worth knowing

    Fly.io is the main alternative I have tested. Its fly mcp launch command is a one-liner and it has 35+ global regions versus Railway’s 4, which matters if your MCP clients are geographically distributed. Idle costs approach zero because Fly Machines auto-suspend when no connections are active — good if traffic is bursty and unpredictable. The downside: the Machines abstraction is more infrastructure-y than Railway’s dashboard, and wiring a Postgres or Redis add-on takes more steps. At low traffic, Railway’s $5 flat fee actually costs less than Fly’s per-second billing if your server gets any sustained use.

    For a full breakdown of Railway, Fly.io, Render, and self-hosted options, see MCP server hosting platforms compared.


    Common gotchas

    Binding to localhost. Already mentioned but worth repeating because it accounts for maybe 40% of “my deploy fails immediately” reports. Always 0.0.0.0.

    PORT mismatch. Railway sets $PORT dynamically. Hardcoding port 8000 in your app without reading $PORT means your health check hits the wrong port and the deploy loops forever in “starting” state. Read the env var.

    No bearer token on a public URL. A Railway service URL is public by default. Without at least a shared bearer token check, anyone who discovers your endpoint can invoke your tools. This is especially bad if your tools have side effects. Add MCP_AUTH_TOKEN and validate it on every request.

    Stateful sessions vs. multiple replicas. If you scale to more than one replica and your MCP server stores session context in memory, requests from the same client may hit different instances and lose state. Either pin to one replica (fine for most indie projects) or externalize session state to Redis.

    Railpack missing a system dependency. Railpack is good but it does not know about every native library. If your Python package needs libxml2, ffmpeg, or anything non-pure-Python, provide a Dockerfile. The Railpack auto-detect is a convenience, not a guarantee.

    SSE connections and Railway’s timeout defaults. Railway has an inactivity timeout. If your MCP client holds an open SSE connection but sends no data for a while, Railway may close it. Configure your MCP client to send keepalive pings, or increase the timeout in Railway’s networking settings.


    Wrapping up

    Railway is, right now, the fastest path from “MCP server working locally” to “MCP server running in production with a real HTTPS URL.” The $5 Hobby plan covers most indie workloads entirely. The git-push deploy loop is frictionless. And the gotchas above — binding address, PORT env var, bearer token auth — are all fixable in under five minutes once you know about them.

    If this guide saved you a debugging session, subscribe to the HostingPundit newsletter. I write one issue per week covering deployments, MCP infrastructure, and the hosting decisions that actually matter for indie devs shipping AI products. No vendor hype, no repackaged press releases.

    [Subscribe to the newsletter → hostingpundit.com/newsletter]

    For what comes next, read MCP server hosting platforms compared — I benchmarked Railway, Fly.io, Render, and self-hosted VPS on cold start, cost, and SSE reliability.


  • How to Migrate Your Lovable App to Vercel in 2026 (Complete Guide)

    Affiliate disclosure: Some links in this article are affiliate links. If you sign up through them, I may earn a commission at no extra cost to you. I only recommend platforms I’ve actually used.


    How to Migrate Your Lovable App to Vercel in 2026 (Complete Guide)

    If you’re reading this, your Lovable bill probably surprised you.

    Maybe you burned through your monthly credits debugging a layout issue on a Tuesday afternoon. Maybe you hit the wall mid-feature — “You’ve used all your messages for this period” — and felt a flash of genuine panic about a product you’ve shipped to real users. Or maybe you got the email about a plan change and sat there doing the math, realizing what you thought was a $25/month habit was about to cost you considerably more if you keep building.

    This pattern comes up repeatedly in builder communities: Lovable makes it fast to go from idea to deployed app, but once a tool has real users, the credit anxiety sets in. Every AI interaction costs something, and debugging on a platform that charges per message is a uniquely unpleasant experience.

    The good news: getting out is not hard. The code Lovable generated is yours. It’s a standard React + Vite app, and it will run anywhere that can serve a Vite build. This guide walks you through exactly what to do, including the parts other migration guides gloss over — the gotchas, the Supabase handoff, and the honest conversation about whether Vercel is even the right destination for you.


    Why People Leave Lovable

    Let’s name the frustrations clearly, because understanding the root cause changes which migration path makes sense for you.

    Credit burn is unpredictable. Lovable’s pricing model charges credits per AI message, and the cost scales with complexity. On the free plan, you get 5 credits a day — roughly 30 per month — which is barely enough to evaluate the platform seriously. The Pro plan gives you 100 monthly credits, but a single debugging session can drain 10–15 credits when the AI needs to iterate. There’s no pay-as-you-go; when you hit the wall, you wait or you upgrade. The credit top-up system makes this worse: upgrading from 100 to 200 credits doesn’t give you 200 fresh credits, it gives you the 100 you need to reach the new cap. That’s a confusing and frustrating UX.

    Privacy and security concerns. As Lovable grows, questions about who can access your project data and AI conversation history are legitimate. If you have sensitive business logic in your prompts, or client data flowing through your app, owning your own deployment pipeline removes a layer of exposure. Review Lovable’s current privacy policy and data handling documentation before committing to the platform for anything sensitive.

    Deployment fragility. Lovable 2.0 introduced one-click deploys, but the “magic” is a wrapper over your GitHub repo and Lovable’s own CDN for certain assets. When that wrapper misbehaves — a broken build, a missing environment variable, a CDN asset that hard-codes to Lovable’s servers — you can’t always debug it from inside the Lovable UI. You need direct access to your deployment pipeline.

    Vendor lock-in anxiety. The moment you start building something that matters, the question surfaces: what happens if Lovable changes pricing again, gets acquired, or goes down? Owning your deployment removes that question entirely.


    What You’ll Lose vs. Gain

    Be honest with yourself here before you migrate.

    What you lose:

    • The AI-in-the-loop development experience. Lovable’s editor with inline AI editing is genuinely good. Post-migration, you’re in VS Code or Cursor, writing prompts manually or doing it yourself.
    • One-click deploys triggered by Lovable saves. Your new deploy pipeline requires a Git push.
    • Lovable’s support team, who are reasonably responsive and understand the Lovable-specific quirks of your codebase.

    What you gain:

    • Full control over your deployment pipeline.
    • No more credit anxiety. You can iterate, refactor, and debug without a meter running.
    • A standard codebase you can hand to any developer. React + Vite + Supabase is a completely normal stack.
    • Cost predictability. Vercel’s Hobby tier is free for personal/non-commercial projects. The Pro plan is $20/developer/month — a fixed, understandable number.
    • The ability to add backend logic, cron jobs, and custom server functions without asking Lovable’s AI to do it.

    The honest take: if you’re still in active AI-assisted development and your project isn’t in production, there’s a real cost to migrating early. The migration makes most sense once your app has real users and you’re in maintenance-and-iteration mode, not build mode.


    Pre-Flight Checklist

    Before you touch anything, gather these:

    • GitHub access: You need a GitHub account and your Lovable project connected to a repo. Paid plans support this natively; verify yours does before assuming.
    • Supabase credentials: Go to your Supabase project → Settings → API. Copy your Project URL, anon key, and service_role key. Store these somewhere secure (a local .env file, a password manager — not a sticky note).
    • Custom domain details: If you’re using a custom domain via Lovable, you’ll need access to your DNS registrar to update records. Know your registrar (Namecheap, Cloudflare, GoDaddy, etc.) and have your login ready.
    • Environment variable inventory: Open Lovable → Project Settings → Environment Variables. Screenshot or copy every variable. Lovable injects these automatically; Vercel does not.
    • Identify your backend dependencies: Does your app use Lovable Edge Functions, or only Supabase? Any Lovable-specific API endpoints in your codebase (search for lovable.dev in your code) need to be replaced before you cut over.
    • Note your current uptime: Don’t migrate during a period your users are active. Plan for 10–30 minutes of DNS propagation downtime.

    Step 1: Sync Your Lovable Project to GitHub

    This is the cleanest part of the migration. Lovable’s GitHub integration is solid and takes about two minutes.

    1. Open your Lovable project dashboard.
    2. Look for the GitHub icon in the top-right toolbar. If you don’t see it, check that your plan includes GitHub integration — this is a paid feature. Free plan users can use an unofficial browser extension, but the official integration is more reliable.
    3. Click Connect to GitHub. Lovable will ask you to authorize access to your GitHub account. Grant it.
    4. On first connection, Lovable creates a new repository under your account (or org). The repo name defaults to your project name — you can rename it in GitHub later.
    5. Once connected, Lovable enables two-way sync: every save in Lovable pushes a commit to the main branch, and every push to that branch syncs back into Lovable.
    6. Verify the connection: go to your GitHub account and confirm the repo exists, has recent commits, and the file structure looks right (you should see src/, public/, package.json, vite.config.ts, and a supabase/ directory if you’re using Supabase).

    One important note before moving on: once you establish your Vercel deployment, stop making changes in Lovable’s editor. The two-way sync is a feature when you’re still using Lovable, but during cutover you want a single source of truth. Make all code changes via Git pushes to GitHub, and let Vercel pick them up from there.


    Step 2: Decide Your Destination — Vercel vs. Alternatives

    Vercel is the default recommendation here, but it’s not the only option, and for some use cases it’s not even the best one. Here’s the honest breakdown:

    Vercel Vercel

    Best for: React/Vite apps, anyone who wants the path of least resistance. Framework detection is excellent — Vercel will identify your Vite project and configure the build command automatically. The Hobby tier is free but explicitly non-commercial (read: if your app earns money, you owe $20/month on Pro). The pro tier’s $20/developer/month is predictable and includes $20 in usage credits monthly.

    Railway Railway

    Best for: apps that need server-side compute, background workers, or managed databases alongside the frontend. Usage-based pricing (typically $8–15/month for a moderate app). No per-seat charges, which matters if you’re a team. Docker support means you’re never fighting framework detection.

    Cloudflare Pages Cloudflare

    Best for: apps that need to survive a traffic spike without a surprise invoice. Unlimited bandwidth on the free tier, no commercial-use restriction, 100K Workers requests/day on the free plan. The tradeoff: slightly more configuration, and the Workers runtime is not Node.js — any server-side logic needs to target the Cloudflare Workers API.

    Netlify

    Solid alternative, but note that Netlify moved to credit-based pricing in 2025 — which is ironic if you’re migrating to escape that model. Still a fine option if you’re already familiar with it.

    Self-hosted VPS (DigitalOcean, Hetzner, Vultr)

    Maximum control, lowest per-GB cost at scale. Not recommended unless you’re comfortable managing Nginx, SSL, and deployments. Hetzner’s CAX11 ARM instance runs ~€3.79/month and can host multiple apps. Good option if you have 3+ apps to consolidate.

    My recommendation for most Lovable migrants: start with Vercel. The friction is lowest, the Vite detection is reliable, and if you hit commercial-use concerns you can migrate to Cloudflare Pages later with minimal effort — both are static-first deployments from the same Git repo.


    Step 3: Import Your GitHub Repo to Vercel

    This is where most migrations succeed or fail based on the details.

    1. Go to [vercel.com](https://vercel.com) and sign in with GitHub.
    2. Click Add New → Project.
    3. Vercel scans your GitHub repos. Find the one Lovable created and click Import.
    4. Framework detection: Vercel should auto-detect Vite. Confirm the build settings:

    – Framework preset: Vite

    – Build command: npm run build (or vite build — check your package.json scripts)

    – Output directory: dist

    – Install command: npm install

    If Vercel guesses wrong here, the build will produce nothing and your deployment will be an empty shell. Fix the output directory to dist if it shows something else.

    1. Environment variables — this is the step most people skip and then wonder why their app is broken. Before clicking Deploy, click Environment Variables and add every variable you noted in your pre-flight checklist. At minimum:

    VITE_SUPABASE_URL — your Supabase project URL

    VITE_SUPABASE_ANON_KEY — your Supabase anon key

    Note the VITE_ prefix: Vite only exposes env vars to the browser if they start with VITE_. Variables without that prefix will be undefined at runtime in a Vite app.

    1. Click Deploy. Watch the build log. Common failure modes:

    Module not found: Can't resolve 'some-package' — a dependency Lovable was injecting is missing from your package.json. Run npm install locally and push.

    – Build succeeds but the app is a blank screen — check the browser console for a failed Supabase connection, usually a missing env var.

    dist/index.html not found — the output directory is wrong.

    1. Once deployed, Vercel gives you a *.vercel.app subdomain. Test your app here before touching DNS.
    2. Custom domain: In your Vercel project → Settings → Domains, add your domain. Vercel will give you either an A record IP or a CNAME value to add at your DNS registrar. Propagation typically takes 5–30 minutes. If you’re moving from Lovable’s custom domain setup, remove Lovable’s DNS records first.

    One Vercel-specific gotcha for AI-heavy apps: Vercel Serverless Functions have a default timeout of 10 seconds on Hobby and 15 seconds on Pro. If your app makes calls to an LLM API or a slow third-party service, you’ll hit this. The fix is either Vercel’s Fluid Compute (which allows longer execution with different pricing) or moving those calls to a background queue on Railway or Supabase Edge Functions.


    Step 4: Migrate Supabase to Your Own Project

    If you provisioned Supabase through Lovable, you already have a Supabase project — but it may be tied to Lovable’s organization, and you want to own it directly.

    Check ownership first: Go to supabase.com/dashboard and log in with the account you used when building in Lovable. If you see your project listed and you’re the owner, you’re fine — just update your Vercel env vars to point to this existing project, and you’re done with this step.

    If you need to migrate to a new Supabase project:

    1. Install the Supabase CLI:

    “`bash

    npm install -g supabase

    supabase login

    “`

    1. Link to your existing project:

    “`bash

    supabase link –project-ref

    “`

    The project ref is in your Supabase dashboard URL: supabase.com/dashboard/project/.

    1. Pull the remote schema into migration files:

    “`bash

    supabase db pull

    “`

    This populates supabase/migrations/ with a timestamped SQL file representing your current schema.

    1. Export your data: For a data migration, go to your Supabase dashboard → Table Editor → select a table → click the menu → Export to CSV. Do this for every table with data you care about. Alternatively, use supabase db dump for a full pg_dump.
    2. Create a new Supabase project in your personal organization.
    3. Push the schema to the new project:

    “`bash

    supabase db push –db-url postgresql://postgres:[password]@[host]:5432/postgres

    “`

    1. Import your CSV data via the Supabase dashboard or psql.
    2. Update all Supabase env vars in Vercel to point to the new project.

    One note: if your Lovable project has Row Level Security policies, they’re included in the schema dump. Verify them on the new project — a missing policy is a silent security hole, not a loud error.


    Step 5: Disconnect from Lovable

    Once your app is live on Vercel and you’ve confirmed everything works:

    1. Verify, then verify again. Test every core user flow on the Vercel deployment, not just the homepage. Auth, data reads, data writes, file uploads — whatever your app does.
    2. Point your domain to Vercel. If you haven’t already done this in Step 3, update your DNS records now. Remove any Lovable-managed DNS entries.
    3. Disable Lovable’s GitHub sync. In your Lovable project → Settings → GitHub, disconnect the integration. This stops Lovable from pushing surprise commits to your repo if you ever accidentally open the editor.
    4. Cancel your Lovable plan. Go to Lovable → Account → Billing → Cancel Plan. Do this only after you’ve confirmed your Vercel deployment is stable for at least 24–48 hours. There’s no graceful downgrade — cancellation is immediate.
    5. Archive or delete the Lovable project itself. You own the code on GitHub; the Lovable project is now redundant.

    Common Gotchas (Read This Before You Deploy)

    These are the things I’ve seen trip up migrations that seemed straightforward.

    1. Lovable-specific package wrappers

    Search your codebase for imports from @lovable/ or any reference to lovable.dev. These are proprietary packages that won’t resolve on a standard npm install. The fix is to find what functionality they provide and replace with a standard equivalent — usually shadcn/ui components, sonner for toast notifications, or plain React.

    2. Hard-coded asset URLs pointing to Lovable’s CDN

    Run a search across your src/ directory for lovable.dev or gptengineer.app (Lovable’s legacy domain). Any hard-coded URLs pointing to Lovable-hosted assets will 404 after migration. Move those assets to your public/ folder and update the references.

    3. Missing VITE_ prefix on environment variables

    Vite’s build system strips any env var that doesn’t start with VITE_ from the client bundle for security reasons. If you add SUPABASE_URL instead of VITE_SUPABASE_URL to Vercel, your app will connect to nothing and fail silently. Double-check every variable name.

    4. Vercel function timeouts for AI calls

    If your app calls OpenAI, Anthropic, or any LLM synchronously from a Vercel Serverless Function, you will hit the 10/15-second timeout under load. Restructure these as streaming responses (Vercel supports Server-Sent Events) or offload to a queue. This is not a Lovable problem — it’s a Vercel architecture consideration.

    5. The blank screen on first load

    Usually a Supabase connection failure or a missing env var. Open browser DevTools → Console before assuming your code is broken. 90% of blank screens after migration are a two-minute env var fix.

    6. Build command differences

    If your Lovable project has a custom vite.config.ts with a base path set (common if you were deploying to a subdirectory), Vercel’s auto-detection will be wrong. Check your config and set the output directory manually.


    Cost Comparison: Lovable vs. Vercel vs. Alternatives

    Platform Free Tier Paid Entry Commercial Use on Free? Notes
    Lovable 30 credits/month $25/month (100 credits) Yes Credits burn on every AI interaction
    Vercel Hobby Generous limits No Personal projects only
    Vercel Pro $20/developer/month Yes $20 included usage credit
    Cloudflare Pages Unlimited bandwidth $5/month (Workers Paid) Yes Best for traffic spikes
    Railway $5 trial credit ~$8–15/month (usage) Yes Best for full-stack apps
    Netlify 300 credits/month $19/month Yes Also credit-based now
    Self-hosted VPS ~$4–6/month (Hetzner) Yes You manage everything

    The honest math: if you’re running a live app with real users, Lovable Pro at $25/month is not unreasonable — but you’re paying for the AI development loop, not the hosting. Once you stop needing AI-assisted generation, you’re paying $25/month for ~100 credits you barely use. Vercel Pro at $20/month or Cloudflare’s $5/month Workers plan delivers better value for a deployed, stable app.


    Wrapping Up

    Migrating off Lovable is not an emergency procedure — it’s a 2–4 hour project for a typical app, most of which is waiting for DNS propagation. The code Lovable generated is genuinely portable. The framework is standard. The biggest friction points are almost always the environment variables and the occasional Lovable-specific import that needs replacing.

    My recommendation: do it once your app is stable, not while you’re still building. The AI-in-the-loop experience Lovable provides is valuable during construction. But once you’re in maintenance mode — shipping bug fixes, tweaking copy, adding features you can spec clearly — you don’t need to pay per-message for that. Your code, your Git repo, your deploy pipeline.

    If this guide saved you a day of confusion, the best thing you can do is subscribe to the HostingPundit newsletter — I cover the indie dev hosting stack weekly, with real tests and no vendor nonsense.


  • How to Deploy an MCP Server on Fly.io in 2026 (Step-by-Step)

    Affiliate disclosure: Some links in this article are affiliate links. If you sign up through them, we may earn a commission at no extra cost to you. Recommendations are based on documented platform capabilities and official pricing as of May 2026.


    How to Deploy an MCP Server on Fly.io in 2026 (Step-by-Step)

    Fly.io is the right platform for MCP servers when Railway’s single-region limitation becomes a constraint. If your MCP clients are distributed across geographies — a team split across Tokyo, London, and San Francisco, or a product serving users worldwide — Fly.io’s 35+ region anycast routing is the feature no other PaaS offers.

    This guide covers deploying a Python or TypeScript MCP server to Fly.io with Streamable HTTP transport, persistent state, custom domain, and proper authentication. It assumes you have a working local MCP server and want it running in production.


    Why Fly.io for MCP

    Multi-region routing. Fly.io deploys your container to multiple datacenters simultaneously and routes each incoming connection to the nearest healthy instance. For an MCP server with a global user base, this reduces latency meaningfully — a client in Tokyo hitting a nrt region instance instead of a US West one saves 150+ ms per tool call.

    Machines can stay allocated or auto-suspend. Unlike serverless platforms that cold-start on every request, Fly Machines can be configured to stay running 24/7 (matching Railway’s always-on behavior) or to suspend when no connections are active and resume in 300–500 ms. For low-traffic MCP servers, auto-suspend drops idle cost toward zero.

    Persistent volumes are mature. Fly Volumes attach to a machine and survive redeploys. Unlike Railway’s volumes (which work but lack snapshot tooling), Fly volumes support snapshots and can be backed up to Fly’s Tigris object storage. For MCP servers that need to persist data between restarts, this matters.

    Per-second billing. A Fly machine running 100% uptime on a shared-cpu-2x (512 MB) costs roughly $4–5/month. If your MCP server handles bursty traffic, auto-suspend drops that to near zero for idle periods.


    Prerequisites

    • A working MCP server in Python or TypeScript using Streamable HTTP transport (not stdio)
    • Docker installed locally
    • Fly CLI (flyctl) installed: curl -L https://fly.io/install.sh | sh
    • A Fly.io account: [Sign up here](https://hostingpundit.com/go/fly-io) — the free tier includes 3 VMs and 3 GB storage

    Step 1: Prepare your MCP server for Fly.io

    Use Streamable HTTP transport

    Your MCP server must use Streamable HTTP transport and bind to 0.0.0.0 on the port Fly.io assigns via $PORT.

    Python (FastMCP):

    import os
    from mcp.server.fastmcp import FastMCP
    
    mcp = FastMCP("my-server")
    
    <h1>... define your tools ...</h1>
    
    if __name__ == "__main__":
        mcp.run(
            transport="streamable-http",
            host="0.0.0.0",
            port=int(os.environ.get("PORT", 8080)),
        )

    TypeScript:

    import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
    import express from "express";
    
    const app = express();
    app.use(express.json());
    
    // ... define your server and tools ...
    
    const transport = new StreamableHTTPServerTransport({ sessionIdGenerator: undefined });
    await server.connect(transport);
    
    app.post("/mcp", (req, res) => transport.handleRequest(req, res, req.body));
    app.get("/mcp", (req, res) => transport.handleRequest(req, res));
    app.delete("/mcp", (req, res) => transport.handleRequest(req, res));
    
    const port = parseInt(process.env.PORT ?? "8080");
    app.listen(port, "0.0.0.0", () => {
      console.log(`MCP server listening on ${port}`);
    });

    Add a Dockerfile

    Fly.io detects Dockerfiles automatically. Create one in your project root:

    Python:

    FROM python:3.12-slim
    WORKDIR /app
    COPY requirements.txt .
    RUN pip install --no-cache-dir -r requirements.txt
    COPY . .
    EXPOSE 8080
    CMD ["python", "server.py"]

    Node.js:

    FROM node:20-slim
    WORKDIR /app
    COPY package<em>.json ./
    RUN npm ci --only=production
    COPY . .
    RUN npm run build
    EXPOSE 8080
    CMD ["node", "dist/index.js"]

    Gotcha: Fly.io sets $PORT automatically (default is 8080 for HTTP services). Your Dockerfile EXPOSE and application code should both use this value consistently.


    Step 2: Initialize the Fly.io app

    Run flyctl launch from your project directory:

    flyctl launch

    flyctl will:

    1. Detect your Dockerfile
    2. Ask for an app name (or generate one)
    3. Ask which region to deploy to (pick the one closest to most of your users — use fly platform regions to list options)
    4. Create fly.toml in your project directory

    Important: edit the generated fly.toml before deploying. The defaults need adjustment for an MCP server:

    app = "your-mcp-server"
    primary_region = "nrt"  # or your chosen region
    
    [build]
      # Fly auto-detects your Dockerfile
    
    [[services]]
      internal_port = 8080
      protocol = "tcp"
    
      [services.concurrency]
        type = "connections"
        hard_limit = 100
        soft_limit = 80
    
      [[services.ports]]
        handlers = ["tls", "http"]
        port = 443
    
      [[services.ports]]
        handlers = ["http"]
        port = 80
        force_https = true
    
      [[services.http_checks]]
        interval = 15000
        timeout = 5000
        grace_period = "10s"
        method = "get"
        path = "/health"
    
    [env]
      PORT = "8080"

    Key configuration points:

    • internal_port = 8080 must match the port your server binds to
    • [[services.http_checks]] with path /health — add a health endpoint to your server
    • force_https = true — redirect HTTP to HTTPS automatically

    Step 3: Set secrets (environment variables)

    Never put credentials in fly.toml or Dockerfiles. Use Fly’s secrets system:

    fly secrets set MCP_AUTH_TOKEN=$(openssl rand -hex 32)
    fly secrets set MY_API_KEY=your_api_key_here
    fly secrets set DATABASE_URL=your_database_url

    Fly injects these as environment variables at runtime. They are encrypted at rest and never appear in build logs.

    To verify secrets are set (shows names but not values):

    fly secrets list

    Step 4: Deploy

    fly deploy

    flyctl will:

    1. Build your Docker image
    2. Push it to Fly’s image registry
    3. Deploy to your configured region(s)
    4. Run health checks
    5. Print your app’s URL on success

    A successful deploy looks like:

    ==> Verifying app config
    ==> Building image
    ...
    ==> Pushing image to registry
    ==> Creating release
    ==> Monitoring deployment
      Machine e784567d create started ... started
      ✓ Machine e784567d [app] is healthy [HTTP GET /health - 200]
    ==> Visit your newly deployed app at https://your-mcp-server.fly.dev

    Your MCP endpoint is live at: https://your-mcp-server.fly.dev/mcp


    Step 5: Add more regions (optional but powerful)

    This is Fly.io’s killer feature for MCP. To deploy to additional regions:

    fly regions add fra  # Frankfurt
    fly regions add lax  # Los Angeles
    fly scale count 3    # One machine per region

    Fly.io’s anycast routing automatically sends each user to the nearest healthy instance. Your MCP clients don’t need to know which region they’re hitting — the DNS routing handles it transparently.

    To see which machines are running and where:

    fly status

    Step 6: Custom domain

    1. Add your domain in the Fly dashboard: your-app → Certificates → Add Certificate
    2. Fly provides a DNS record to add at your registrar (typically an A record or CNAME)
    3. Fly provisions a Let’s Encrypt certificate automatically

    Alternatively, via CLI:

    fly certs create mcp.yourdomain.com
    fly certs show mcp.yourdomain.com  # Shows required DNS records

    Once propagated, your MCP endpoint is: https://mcp.yourdomain.com/mcp


    Step 7: Connect to Claude Code / Claude Desktop

    Claude Code (.claude/settings.json or ~/.claude/settings.json):

    {
      "mcpServers": {
        "my-server": {
          "type": "http",
          "url": "https://your-mcp-server.fly.dev/mcp",
          "headers": {
            "Authorization": "Bearer YOUR_MCP_AUTH_TOKEN"
          }
        }
      }
    }

    Claude Desktop (claude_desktop_config.json):

    {
      "mcpServers": {
        "my-server": {
          "transport": {
            "type": "http",
            "url": "https://your-mcp-server.fly.dev/mcp"
          },
          "headers": {
            "Authorization": "Bearer YOUR_MCP_AUTH_TOKEN"
          }
        }
      }
    }

    Test without a client:

    curl -X POST https://your-mcp-server.fly.dev/mcp 
      -H "Content-Type: application/json" 
      -H "Authorization: Bearer YOUR_MCP_AUTH_TOKEN" 
      -d '{"jsonrpc":"2.0","method":"tools/list","id":1}'

    Persistent storage (when you need it)

    If your MCP server needs to persist data (embeddings cache, conversation history, tool state), create a Fly Volume:

    fly volumes create mcp_data --size 10 --region nrt

    Mount it in fly.toml:

    [mounts]
      source = "mcp_data"
      destination = "/data"

    Your MCP server can then write to /data/ and the data persists across restarts and redeploys.

    Note on multi-region and volumes: Fly Volumes are attached to a single machine in a single region. If you run machines in multiple regions and each needs persistent storage, each region’s machine gets its own volume. For shared state across regions, use an external Postgres (Fly Postgres, Supabase, Neon) or Fly’s Tigris object storage.


    Cost breakdown

    Configuration Monthly cost
    Free tier (3 shared-cpu-1x, 256 MB, 1 region) $0
    Single shared-cpu-2x (512 MB), 1 region, always-on ~$4.50
    Single shared-cpu-2x, 1 region, auto-suspend (low traffic) ~$0.50–2.00
    3-region deployment (nrt, fra, sjc), shared-cpu-2x each ~$13–15
    + 10 GB volume +$1.50/month

    Gotcha: Fly’s free tier uses shared-cpu-1x machines with 256 MB RAM. Python MCP servers using FastMCP, LangChain, or similar libraries routinely exceed 256 MB at startup. Budget for at least a shared-cpu-2x (512 MB) if you’re running Python. Node.js MCP servers typically fit within 256 MB for simple tools.


    Common gotchas

    Wrong internal_port in fly.toml. If internal_port doesn’t match the port your server binds to, Fly’s health checks fail and the deploy loops indefinitely. Double-check that fly.toml‘s internal_port matches your app’s $PORT.

    Auth token required. Your Fly.io app URL is publicly reachable. Without bearer token authentication, anyone can invoke your MCP tools. Set MCP_AUTH_TOKEN as a secret and validate it on every /mcp request.

    Multi-region state. If you scale to multiple regions, avoid in-memory session state — different requests may hit different machines. Use Fly Volumes (per-region) or an external database for state that must be consistent across instances.

    Health check grace period. If your server takes >10 seconds to start (common with large Python dependencies), Fly may kill the machine before it’s ready. Set grace_period = "30s" in your health check config.

    SSE connections and Fly’s idle timeout. Fly’s load balancer closes connections idle for over 75 seconds by default. For MCP clients holding long-lived SSE connections, configure your client to send keepalive pings or increase the Fly timeout via [services.tcp_checks] settings.


    Railway vs. Fly.io: when to choose each

    If you’re deciding between these two platforms specifically for an MCP server:

    • Single region, fast deploys, minimal config → Railway
    • Multi-region, global users, per-second billing → Fly.io
    • Need GPU alongside MCP tools → Railway

    For a full side-by-side, see Railway vs Fly.io for AI Agents.


    Prices verified May 2026. Check official docs before committing — hosting pricing changes frequently.*


  • Self-Host Ollama on a $7 VPS: Complete Setup Guide (2026)

    Affiliate disclosure: Some links in this article are affiliate links. If you sign up through them, we may earn a commission at no extra cost to you. Recommendations are based on documented platform capabilities and official pricing as of May 2026.


    Self-Host Ollama on a $7 VPS: Complete Setup Guide (2026)

    Running your own LLM inference server costs less than a streaming subscription. Ollama makes it straightforward to run models like Llama 3, Mistral, Qwen, and Phi locally or on a VPS — and a CPU-only server is enough for many use cases.

    This guide covers the complete setup: choosing a VPS, installing Ollama, picking models that fit the hardware, securing the API endpoint, and keeping costs predictable. No GPU required for the $7 tier.


    When CPU-only Ollama is actually useful

    GPU hosting is where Ollama shines for speed — but CPU-only inference is not useless. It works well for:

    • Development and experimentation — testing prompts, evaluating models, building prototypes before committing to GPU costs
    • Low-latency simple tasks — short classification, simple RAG queries on small corpora
    • Small models — Phi-3 Mini (3.8B), Gemma 2 (2B), Llama 3.2 (1B/3B) run adequately on a 4-core CPU with 8 GB RAM
    • Always-available fallback — if your primary GPU inference provider has an outage, a CPU Ollama instance handles the reduced workload

    For production inference on large models (13B+), GPU instances (Hetzner Cloud GPU, Lambda Labs, RunPod) are necessary. This guide targets the CPU use case explicitly.


    Hardware requirements by model size

    Model Parameters Minimum RAM Recommended RAM CPU-only usable?
    Phi-3 Mini 3.8B 4 GB 6 GB Yes — reasonable speed
    Llama 3.2 1B/3B 2–4 GB 4 GB Yes — fast
    Gemma 2 2B/9B 4–8 GB 8 GB 2B yes; 9B slow
    Qwen 2.5 7B 6 GB 8 GB Usable, slow
    Mistral 7B 7B 6 GB 8 GB Usable, slow
    Llama 3 8B 6 GB 8 GB Slow on CPU
    Llama 3.1 70B 40+ GB 48 GB CPU not practical

    Rule of thumb: Model file size ≈ RAM needed. A Q4-quantized 7B model is ~4 GB; you need roughly 1.5× that in available RAM (model + inference overhead).


    Choosing a VPS

    For CPU-only Ollama, the sweet spot is a 4 vCPU / 8 GB RAM instance. This handles 7B models (slowly) and smaller models (adequately).

    Hetzner Cloud (recommended) Hetzner

    Hetzner offers the best price-to-performance for this use case:

    Instance vCPU RAM Monthly Best for
    CX22 2 AMD 4 GB €4.85 (~$5.20) Llama 3.2 3B, Phi-3 Mini
    CX32 4 AMD 8 GB €9.68 (~$10.40) Mistral 7B, Qwen 7B
    CX42 8 AMD 16 GB €19.35 (~$21) Larger models

    Hetzner’s network locations: Nuremberg, Falkenstein (Germany), Helsinki (Finland), Hillsboro (US). Choose the one closest to your users.

    Sign up for Hetzner

    DigitalOcean Digitalocean

    Droplet vCPU RAM Monthly
    Basic 4 GB 2 4 GB $24
    Basic 8 GB 4 8 GB $48

    DigitalOcean is more expensive than Hetzner for equivalent specs but has a more beginner-friendly dashboard and more global datacenter locations.

    What to avoid

    • Shared CPU instances (Hetzner CX11, DigitalOcean Basic 1 GB) — insufficient RAM for any meaningful model
    • ARM instances — Ollama runs on ARM but binary availability varies; x86 is safer for initial setup

    Step 1: Provision your VPS

    For this guide, using a Hetzner CX32 (4 vCPU, 8 GB RAM, Ubuntu 22.04).

    After creating the server in the Hetzner Cloud console:

    <h1>SSH in with your key</h1>
    ssh root@your-server-ip
    
    <h1>Update the system</h1>
    apt update && apt upgrade -y
    
    <h1>Create a non-root user (recommended)</h1>
    useradd -m -s /bin/bash ollama
    usermod -aG sudo ollama

    Step 2: Install Ollama

    The official install script handles the binary, systemd service, and user setup:

    curl -fsSL https://ollama.com/install.sh | sh

    This installs Ollama as a systemd service that starts automatically. Verify:

    systemctl status ollama
    <h1>Should show: active (running)</h1>
    
    ollama --version

    By default, Ollama listens on localhost:11434 and is not externally accessible. This is intentional — the API has no built-in authentication.


    Step 3: Pull your first model

    <h1>As the ollama user or root</h1>
    ollama pull llama3.2:3b       # 2 GB download, good baseline
    ollama pull phi3:mini         # 2.3 GB, fast on CPU
    ollama pull mistral:7b-q4_K_M # 4.1 GB, better quality, slower on CPU
    
    <h1>List downloaded models</h1>
    ollama list

    Test locally:

    ollama run phi3:mini "What is the capital of Japan?"

    For API usage (while on the server):

    curl http://localhost:11434/api/generate 
      -d '{"model":"phi3:mini","prompt":"What is the MCP protocol?","stream":false}'

    Step 4: Expose the API securely

    By default, Ollama only listens on localhost. To expose it externally, you have two options.

    Option A: nginx reverse proxy with bearer token auth (recommended)

    Install nginx and configure a reverse proxy with token authentication:

    apt install nginx -y

    Create /etc/nginx/sites-available/ollama:

    server {
        listen 443 ssl;
        server_name ollama.yourdomain.com;
    
        ssl_certificate /etc/letsencrypt/live/ollama.yourdomain.com/fullchain.pem;
        ssl_certificate_key /etc/letsencrypt/live/ollama.yourdomain.com/privkey.pem;
    
        location / {
            # Simple bearer token auth
            set $auth_token "Bearer YOUR_STRONG_TOKEN_HERE";
            if ($http_authorization != $auth_token) {
                return 401 "Unauthorizedn";
            }
    
            proxy_pass http://localhost:11434;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
    
            # Required for streaming responses
            proxy_buffering off;
            proxy_read_timeout 300s;
            proxy_connect_timeout 300s;
        }
    }

    Get a TLS certificate with Certbot:

    apt install certbot python3-certbot-nginx -y
    certbot --nginx -d ollama.yourdomain.com

    Enable and restart nginx:

    ln -s /etc/nginx/sites-available/ollama /etc/nginx/sites-enabled/
    nginx -t && systemctl reload nginx

    Option B: Bind Ollama to all interfaces with firewall rules

    Edit Ollama’s systemd service to bind to 0.0.0.0:

    systemctl edit ollama

    Add:

    [Service]
    Environment="OLLAMA_HOST=0.0.0.0"

    Then restrict access via UFW to specific IP ranges:

    ufw allow from YOUR_IP to any port 11434
    ufw enable

    Note: Option A with nginx is more robust — it gives you TLS, proper auth, and easy future expansion (rate limiting, multiple upstreams).


    Step 5: Configure for production use

    Set model memory limits

    Add to Ollama’s systemd override to control memory usage:

    systemctl edit ollama
    [Service]
    Environment="OLLAMA_MAX_LOADED_MODELS=1"
    Environment="OLLAMA_NUM_PARALLEL=2"

    OLLAMA_MAX_LOADED_MODELS=1 ensures only one model is resident in RAM at once — critical on 8 GB RAM. OLLAMA_NUM_PARALLEL=2 allows two concurrent requests to the same model.

    Set up log rotation

    Ollama logs can grow. Configure rotation:

    cat > /etc/logrotate.d/ollama << 'EOF'
    /var/log/ollama/<em>.log {
        daily
        rotate 7
        compress
        delaycompress
        missingok
        notifempty
    }
    EOF

    Enable automatic restarts

    Ollama’s default systemd unit includes Restart=always. Verify:

    systemctl cat ollama | grep Restart

    If it’s not set, add it via systemctl edit ollama.


    Step 6: Use with your applications

    OpenAI-compatible API

    Ollama exposes an OpenAI-compatible API at /v1/. Applications using the OpenAI Python or JavaScript SDK can talk to Ollama with a base URL override:

    from openai import OpenAI
    
    client = OpenAI(
        base_url="https://ollama.yourdomain.com/v1",
        api_key="Bearer YOUR_STRONG_TOKEN_HERE",
    )
    
    response = client.chat.completions.create(
        model="phi3:mini",
        messages=[{"role": "user", "content": "Explain Docker volumes briefly."}]
    )
    print(response.choices[0].message.content)

    LangChain integration

    from langchain_community.llms import Ollama
    
    llm = Ollama(
        base_url="https://ollama.yourdomain.com",
        model="phi3:mini",
        headers={"Authorization": "Bearer YOUR_STRONG_TOKEN_HERE"}
    )
    
    response = llm.invoke("What is the difference between Railway and Fly.io?")

    Claude Code agent fallback

    In a Claude Code agent, configure Ollama as a fallback for tasks where Claude isn’t needed:

    import os
    import anthropic
    from openai import OpenAI
    
    <h1>Use Claude for complex reasoning</h1>
    claude = anthropic.Anthropic()
    
    <h1>Use local Ollama for simple classification/routing</h1>
    local_llm = OpenAI(
        base_url=os.environ["OLLAMA_URL"] + "/v1",
        api_key=os.environ["OLLAMA_TOKEN"],
    )
    
    def classify_intent(text: str) -> str:
        """Simple classification — Ollama is fast enough for this."""
        response = local_llm.chat.completions.create(
            model="phi3:mini",
            messages=[{"role": "user", "content": f"Classify as 'question', 'command', or 'other': {text}"}]
        )
        return response.choices[0].message.content.strip()

    Cost comparison: self-hosted vs. API

    Running phi3:mini on a Hetzner CX32 ($10.40/month):

    Scenario Self-hosted (Hetzner) Anthropic Claude Haiku Notes
    1M tokens/month ~$10.40 flat ~$1 Self-hosted cheaper at high volume
    100k tokens/month ~$10.40 flat ~$0.10 API is cheaper at low volume
    10M tokens/month ~$10.40 flat ~$10 Break-even zone

    The real advantage of self-hosted is not cost for typical workloads — it’s data privacy and no per-token anxiety. If you need to process sensitive documents, experiment freely, or run high-volume batch jobs, the flat monthly rate makes sense.


    Common issues

    OOM kills. If your server runs out of RAM mid-inference, reduce OLLAMA_MAX_LOADED_MODELS to 1 and consider a smaller quantization level (e.g., q4_K_S instead of q8_0).

    Slow inference on CPU. Expected. A 7B model on a 4-core CPU generates ~2–5 tokens/second. For interactive use, prefer models under 4B. For batch use, the speed is fine.

    Port 11434 not accessible. Check that UFW or Hetzner’s firewall rules allow the port, or use the nginx reverse proxy approach.

    Model download fails. Ensure the server has at least 2× the model size in free disk space during download (the download and the final model both take space temporarily). Hetzner CX32 includes 40 GB disk — sufficient.


    Next steps

    If you want GPU inference instead of CPU:

    • [Modal vs Replicate vs RunPod for GPU Inference](https://hostingpundit.com/modal-vs-replicate-vs-runpod/)
    • Hetzner Cloud GPU instances (CCX53, CCX63) — available in EU regions

    If you want to use Ollama as an MCP server backend:

    • Build an MCP server that wraps the Ollama API
    • Deploy the MCP server to Railway or Fly.io
    • Connect from Claude Code or Claude Desktop

    Prices verified May 2026. Check official documentation before provisioning.*