Skip to content
tutorial

MCP SSE deprecated: migrate to Streamable HTTP before your server breaks

| 8 min read
Data center with network switches representing stateless MCP transport
Photo by Taylor Vick on Unsplash
Data center with network switches representing stateless MCP transport infrastructure
SSE pinned one client to one server. Streamable HTTP lets any instance answer any request, so you can scale MCP the same way you scale any HTTP service. Photo by Taylor Vick on Unsplash

MCP Server-Sent Events transport reached end-of-life on April 1, 2026. Client libraries still ship SSE support for backward compatibility, but the next minor version of the base protocol removes it, and the public MCP registry now rejects new SSE-only listings. If your MCP server still answers on /sse and /messages, you have a migration window, not a forever window.

SSE held one TCP connection open per client. Two copies of your MCP server behind a load balancer produced split-brain sessions; horizontal scaling required sticky routing that most managed load balancers cannot express cleanly. Streamable HTTP uses standard request-response cycles with optional resumable streams, which means any instance can answer any request and every CDN in the world already knows how to cache and route it.

Here is the full migration path: the before-and-after server code, a datastore pattern for stateful tools, a Cloudflare Workers implementation, a FastAPI equivalent, and a .well-known metadata file so clients can discover your server without opening a connection first.

Step 1: Delete the SSE handler

The old code looks like this. Two routes, a session map held in process memory, and a transport that keeps the response open until the client disconnects:

// OLD: SSE transport, one long-lived connection per client
import { SseServerTransport } from "@modelcontextprotocol/sdk/server/sse.js";
import express from "express";

const app = express();
const server = buildMcpServer();
const transports = new Map();

app.get("/sse", async (req, res) => {
  const transport = new SseServerTransport("/messages", res);
  transports.set(transport.sessionId, transport);
  await server.connect(transport);
});

app.post("/messages", async (req, res) => {
  const sessionId = req.query.sessionId;
  const transport = transports.get(sessionId);
  if (!transport) return res.status(404).end();
  await transport.handlePostMessage(req, res);
});

app.listen(3000);

Every problem with SSE shows up in that snippet. The transports Map is only visible to one process; restart the server and every open session dies; scale horizontally and half your clients hit a server that has never heard of them. Rip it out.

Step 2: Install Streamable HTTP

The new handler is smaller. One route answers both GET and POST; each request spins up a fresh server and transport pair; the garbage collector cleans up when the response closes:

// NEW: Streamable HTTP, stateless, fresh instance per request
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import express from "express";

const app = express();
app.use(express.json());

app.all("/mcp", async (req, res) => {
  const server = buildMcpServer();
  const transport = new StreamableHTTPServerTransport({
    sessionIdGenerator: undefined, // stateless: no session pinning
    enableJsonResponse: true,
  });

  res.on("close", () => {
    transport.close();
    server.close();
  });

  await server.connect(transport);
  await transport.handleRequest(req, res, req.body);
});

app.listen(3000);

Three keys matter. sessionIdGenerator: undefined opts out of session pinning so the transport is fully stateless. enableJsonResponse: true returns a single JSON body for tools that do not emit progress, which keeps the path fast and cacheable. The res.on("close") cleanup prevents socket leaks on early client disconnect.

Step 3: Move session state out of process memory

A stateless handler does not mean a stateless product. Long-running tools still need to report progress across multiple requests. Put that state in Redis, a Durable Object, DynamoDB, or Postgres; read it on entry, write it on exit:

// Session state lives in a datastore, not in memory
import { createClient } from "@redis/client";
const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();

function buildMcpServer() {
  const server = new McpServer({ name: "acme", version: "1.0.0" });

  server.tool("long_running_job", { args: z.object({ id: z.string() }) }, async ({ args }, ctx) => {
    // Resume progress from the last request in the same logical session
    const sessionId = ctx.request.headers["mcp-session-id"];
    const progress = sessionId
      ? Number((await redis.get(`progress:${sessionId}`)) ?? 0)
      : 0;

    for (let i = progress; i < 100; i++) {
      await ctx.progress?.(i, 100);
      if (sessionId) await redis.set(`progress:${sessionId}`, String(i), { EX: 300 });
      await sleep(100);
    }

    if (sessionId) await redis.del(`progress:${sessionId}`);
    return { content: [{ type: "text", text: "done" }] };
  });

  return server;
}

The Mcp-Session-Id header, if present, identifies the logical session; the handler uses it as a datastore key. A Last-Event-Id header from the client lets the transport resume a stream after a disconnect without restarting the tool call. Both headers are optional; stateless tools can ignore them entirely.

Step 4: Deploy serverless

Streamable HTTP unlocks the thing SSE blocked: running an MCP server on Cloudflare Workers, AWS Lambda, Vercel Functions, or Fly Machines. Here is the full Cloudflare Workers server in 40 lines, hitting Botoi's IP Lookup endpoint as one sample tool:

// Cloudflare Workers: Streamable HTTP in under 40 lines
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import { z } from "zod";

function buildServer(env) {
  const server = new McpServer({ name: "acme-worker", version: "1.0.0" });

  server.tool("ip_lookup", { args: z.object({ ip: z.string() }) }, async ({ args }) => {
    const r = await fetch(`https://api.botoi.com/v1/ip/lookup`, {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        "X-API-Key": env.BOTOI_API_KEY,
      },
      body: JSON.stringify({ ip: args.ip }),
    });
    const { data } = await r.json();
    return { content: [{ type: "text", text: JSON.stringify(data, null, 2) }] };
  });

  return server;
}

export default {
  async fetch(req, env) {
    if (new URL(req.url).pathname !== "/mcp") return new Response("not found", { status: 404 });

    const server = buildServer(env);
    const transport = new StreamableHTTPServerTransport({
      sessionIdGenerator: undefined,
      enableJsonResponse: true,
    });

    await server.connect(transport);
    return transport.fetch(req);
  },
};

No session map, no background timers, no keep-alive pings. The Worker spins up on demand, answers the request, and shuts down. One Worker handles any number of parallel clients because there is no shared mutable state. Cold starts for MCP servers on Workers clock in under 5 ms for most tool surfaces; on Lambda they are 50 to 200 ms cold.

Step 5: Python migration with FastAPI

Python servers get the same shape. The FastAPI handler constructs the MCP server per request and delegates to the transport:

# Python equivalent for FastAPI
from fastapi import FastAPI, Request
from mcp.server import Server
from mcp.server.streamable_http import StreamableHTTPTransport

app = FastAPI()

def build_server() -> Server:
    server = Server("acme-py")

    @server.tool("hash_sha256")
    async def hash_sha256(text: str) -> str:
        import hashlib
        return hashlib.sha256(text.encode()).hexdigest()

    return server

@app.post("/mcp")
@app.get("/mcp")
async def mcp_handler(request: Request):
    # Fresh server per request; no shared state
    server = build_server()
    transport = StreamableHTTPTransport(stateless=True)
    await server.connect(transport)
    return await transport.handle(request)

The pattern is the same across languages: build server per request, hand the HTTP request to the transport, return whatever the transport produces. Language runtimes differ; the architecture does not.

Step 6: Publish a server card for discovery

One of the gaps the 2026 roadmap closes is discovery without connection. Registries and crawlers used to need a live handshake to learn what your server did. Serve a JSON document at /.well-known/mcp/server-card.json and clients can learn the transport URL, authentication scheme, and capability set before they connect:

# /.well-known/mcp/server-card.json
{
  "schema_version": "1.0",
  "server": {
    "name": "Acme MCP",
    "version": "1.0.0",
    "transport": {
      "type": "streamable_http",
      "url": "https://api.acme.com/mcp"
    },
    "authentication": {
      "type": "oauth2",
      "authorization_url": "https://auth.acme.com/oauth/authorize",
      "token_url": "https://auth.acme.com/oauth/token",
      "scopes": ["tools:read", "tools:invoke"]
    },
    "capabilities": {
      "tools": true,
      "resources": true,
      "prompts": false
    }
  }
}

This is the piece that lets agent platforms index your server without probing it. Auth0 for Agents, Cloudflare Agent Cloud, and the MCP public registry all consume this format; add it once and your server becomes listable.

Verify the migration

Before you flip DNS, run the inspector or curl against the new endpoint. The first surfaces a UI with every tool and resource exposed; the second confirms the wire format is right:

# Check your client still works against the new transport
npx @modelcontextprotocol/inspector@latest \
  --url https://api.acme.com/mcp \
  --header "Authorization: Bearer $API_KEY"

# Or raw curl
curl -X POST https://api.acme.com/mcp \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}'

A successful tools/list response with your full tool catalog means the server is live. If the response comes back as text/event-stream when you expect JSON, the transport has enableJsonResponse disabled; flip the flag.

Keep both transports running on different paths for a week after your cutover: /sse as the old listener, /mcp as the new one. Emit a log line every time a client connects to /sse with its user-agent. When the log goes quiet, you can delete the old handler. Most client libraries shipped Streamable HTTP support between December 2025 and February 2026; expect a long tail of old Cursor and Claude Desktop installs.

What stays the same, what changes

Concern SSE transport Streamable HTTP
Connection model One long-lived per client Request-response, optional stream
Load balancing Sticky sessions required Any instance answers any request
Session state In-process memory Datastore keyed on session ID
Serverless fit Blocked by connection duration limits Native
Progress streaming Default Opt-in via Accept header
Discovery Live handshake /.well-known/mcp/server-card.json
Tool invocation JSON-RPC over SSE frames JSON-RPC over HTTP body

Key takeaways

  • SSE is deprecated, not removed. Clients still accept it through 2026, but new servers and registry listings require Streamable HTTP.
  • Build fresh per request. No in-process session maps; let the server object live only as long as the RPC takes.
  • Push state to a datastore. Redis, Durable Objects, or Postgres keyed on the Mcp-Session-Id header.
  • Serverless is back on the table. Cloudflare Workers, Lambda, Vercel Functions; none of them supported long-lived SSE well.
  • Publish a server card. /.well-known/mcp/server-card.json makes your server discoverable without a connection.

Botoi's MCP server ships on Streamable HTTP at api.botoi.com/mcp with 49 curated tools across IP lookup, email validation, DNS, hashing, JWT signing, and QR generation. Source is MIT-licensed and mirrors the pattern above; read the setup docs for Claude Desktop, Claude Code, Cursor, VS Code, and Windsurf configs.

Frequently asked questions

Is MCP SSE removed or just deprecated?
Deprecated as of April 1, 2026; runtime support still exists in client libraries for backward compatibility, but new MCP servers and registry listings require Streamable HTTP. The 2026 roadmap removes SSE from the base protocol entirely in the next minor version; start your migration now rather than at removal.
Why did MCP drop SSE?
SSE held one TCP connection open per client and made horizontal scaling painful. Two identical SSE servers behind a round-robin load balancer produced split-brain sessions: tool state stored on server A was invisible to server B. Streamable HTTP uses short request-response cycles with optional resumable streams, so load balancers and CDNs route every request without pinning a client to an instance.
What is fresh-instance-per-request?
Each incoming request constructs a new MCP server object, handles the RPC, and throws the instance away. No in-memory state between requests. State that needs to persist (progress tokens, tool sessions) lives in a datastore the handler reads on entry and writes on exit. This lets you run the same server on a serverless platform like Cloudflare Workers or AWS Lambda without the 15-minute connection-duration limit biting.
Do I still need WebSockets for streaming tool output?
No. Streamable HTTP includes an optional SSE-style stream inside a standard POST response for tools that emit partial results. The difference from old SSE is that the stream lives inside one HTTP request and ends with the tool call. You do not keep a socket open between tool calls.
How do I test a Streamable HTTP server locally?
Use the official MCP inspector (npx @modelcontextprotocol/inspector) which now speaks both transports. Or curl the endpoint with a JSON-RPC body and an Accept: text/event-stream header; you will see either a single JSON response or an event stream depending on whether the tool emits progress. Session resume is testable with Mcp-Session-Id and Last-Event-Id headers.

Try this API

IP Lookup API — interactive playground and code examples

More tutorial posts

Start building with botoi

150+ API endpoints for lookup, text processing, image generation, and developer utilities. Free tier, no credit card.