Why Edge Functions for AI

Three reasons Edge Functions are the right choice for AI API calls:

1. **Security**: Your OpenAI/Anthropic API key lives as a server-side secret, never exposed to the client. Anyone with your frontend code can't extract the key.

2. **Database access**: Edge Functions run inside Supabase's infrastructure and have direct, low-latency access to your PostgreSQL database. You can fetch user context, store results, and log usage in the same function that calls the AI.

3. **Streaming support**: Edge Functions support Response streaming, which lets you send AI output to the client word-by-word, dramatically improving perceived performance for long AI responses.

Basic OpenAI Proxy Function

The simplest Edge Function: receive a prompt, call OpenAI, return the response.

```typescript
import OpenAI from "npm:openai"

const openai = new OpenAI({ apiKey: Deno.env.get("OPENAI_API_KEY") })

Deno.serve(async (req) => {
  const { prompt } = await req.json()
  
  const chat = await openai.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: prompt }],
    max_tokens: 500
  })
  
  return new Response(
    JSON.stringify({ result: chat.choices[0].message.content }),
    { headers: { "Content-Type": "application/json" } }
  )
})
```

Deploy: `supabase functions deploy openai-proxy`. Secrets: `supabase secrets set OPENAI_API_KEY=sk-...`.

Adding Authentication and Rate Limiting

Every AI Edge Function should verify the user is authenticated and check their usage limits:

```typescript
import { createClient } from "npm:@supabase/supabase-js"
Deno.serve(async (req) => {
  // Verify auth token
  const token = req.headers.get("Authorization")?.replace("Bearer ", "")
  const supabase = createClient(Deno.env.get("SUPABASE_URL"), Deno.env.get("SUPABASE_SERVICE_ROLE_KEY"))
  const { data: { user }, error } = await supabase.auth.getUser(token)
  if (error || !user) return new Response("Unauthorized", { status: 401 })
  
  // Check rate limit (max 20 requests/hour)
  const oneHourAgo = new Date(Date.now() - 3600000).toISOString()
  const { count } = await supabase
    .from("ai_usage_log")
    .select("*", { count: "exact" })
    .eq("user_id", user.id)
    .gte("created_at", oneHourAgo)
  
  if (count >= 20) return new Response("Rate limit exceeded", { status: 429 })
  
  // ... call OpenAI and log usage
})
```

Streaming Responses to WeWeb

Streaming sends AI output to the client progressively, users see text appearing word by word instead of waiting for the full response.

In the Edge Function:
```typescript
const stream = await openai.chat.completions.create({
  model: "gpt-4o",
  messages,
  stream: true
})

const readable = new ReadableStream({ async start(controller) { for await (const chunk of stream) { const text = chunk.choices[0]?.delta?.content || "" controller.enqueue(new TextEncoder().encode(text)) } controller.close() } })

return new Response(readable, {
  headers: { "Content-Type": "text/event-stream" }
})
```

In WeWeb: use a custom JavaScript action to fetch the stream URL and update a page variable character by character as chunks arrive.

Building a RAG Pipeline (Retrieval Augmented Generation)

RAG improves AI answers by injecting relevant knowledge into the prompt at query time. Architecture:

1. **Knowledge ingestion** (run once): For each document in your knowledge base, call OpenAI's embedding API to get a 1536-dimensional vector. Store vectors in Supabase using the pgvector extension.

2. **Query time**: When a user asks a question, embed the question (same embedding API), then run a similarity search in Supabase: `SELECT content, 1 - (embedding <=> query_embedding) AS similarity FROM documents ORDER BY similarity DESC LIMIT 3`.

3. **Augmented prompt**: Inject the top 3 matching documents into the system prompt: "Answer using only the following context: [docs]. If the answer isn't in the context, say you don't know."

Result: the AI answers only from your documentation, with zero hallucination about things you haven't documented.

Cold Start Optimisation for Edge Functions

Supabase Edge Functions are Deno-based and run on Deno Deploy's global edge network. A cold start, the first invocation of a function that hasn't been called recently, typically takes 200–500ms. For AI features where users expect instant feedback, this cold start latency can be noticeable.

Several optimisation strategies reduce cold start impact. First, import only what you need. A function that imports the full OpenAI SDK adds more bundle weight than one that only imports the `ChatCompletion` type. Use named imports and tree-shaking-friendly patterns. Second, pre-warm critical functions by calling them on a schedule. A Supabase cron job that calls your AI function with a synthetic request every 5 minutes keeps the instance warm at the cost of a few API calls per day.

Third, use response caching for deterministic prompts. If users frequently ask the same type of question (document summarisation, category classification), cache the output in a Supabase table keyed on a hash of the input. Return the cached result immediately for repeated inputs, zero cold start, zero API cost, sub-10ms response. This is particularly effective for classification tasks where the set of possible inputs is bounded.

AI Inference at the Edge: Calling OpenAI and Anthropic From Edge Functions

Both the OpenAI SDK and the Anthropic SDK work in Deno's runtime environment, which is what Supabase Edge Functions use. You import them via `npm:` specifiers: `import OpenAI from 'npm:openai'` and `import Anthropic from 'npm:@anthropic-ai/sdk'`. Both SDKs handle the HTTPS calls, retry logic, and error handling for their respective APIs.

For a production AI feature, you should handle model selection dynamically. Store the model name as a Supabase Edge Function secret rather than hardcoding it. This lets you switch from `gpt-4o` to `gpt-4o-mini` (for lower-cost tasks) or from `claude-3-5-sonnet` to `claude-3-haiku` without redeploying the function. You update the secret and the next request uses the new model.

Cost management is critical for AI functions with open user access. Log every API call with the token count returned in the response. Set monthly spending alerts in your OpenAI or Anthropic dashboard. For free-tier users, cap usage at a token budget per month and enforce it in the Edge Function before the API call is made. We build this cost-tracking layer into every AI feature we ship, it has prevented unexpected $3,000 monthly bills on more than one client project.

Streaming Responses: Server-Sent Events Pattern

The Server-Sent Events (SSE) pattern is the standard way to stream AI responses from a Supabase Edge Function to a browser. The function sets `Content-Type: text/event-stream` and writes chunks in the format `data: {text}\n\n` as they arrive from the AI API. The browser uses the native `EventSource` API or a `fetch` with `ReadableStream` to consume the stream incrementally.

In WeWeb, implementing SSE requires a custom JavaScript action because the built-in HTTP request actions wait for the full response before proceeding. The action opens a `fetch` request, reads the response body as a stream using `response.body.getReader()`, decodes each chunk, and appends it to a page variable. This page variable is bound to a text element on the canvas, so users see the text appearing character by character.

The result is a dramatically better AI feature UX. A 300-word AI response from GPT-4o takes roughly 5 seconds to complete. Without streaming, the user sees a spinner for 5 seconds and then the full text appears. With streaming, they start reading the response within 300ms of submitting their prompt. In user testing, streaming consistently rates as more responsive and intelligent-feeling, even though the total generation time is the same.

Using Edge Functions as Webhook Handlers

Edge Functions are an excellent choice for handling webhooks from external services, Stripe payment events, GitHub push notifications, Twilio SMS, or any service that POSTs data to a URL. They're always available (no spinning up a server), globally distributed (low latency from the webhook sender's data centre), and have direct Supabase database access to update records in response to events.

A Stripe webhook Edge Function validates the webhook signature, parses the event type, and updates the relevant Supabase record. For a subscription upgrade event: verify the signature using `stripe.webhooks.constructEvent()` with your webhook secret, extract the customer ID and new plan, update the `subscriptions` table in Supabase, and return a 200 response within 5 seconds (Stripe's timeout). This is the complete payment lifecycle handler, no separate server needed.

For AI applications, webhooks are used for asynchronous processing. When a user uploads a document for AI analysis, you don't make them wait, you queue the task via a Supabase row insert, a background job picks it up and calls the AI API, and when the result is ready, a Supabase trigger sends a push notification or updates the dashboard via realtime. Edge Functions handle both the intake webhook and the outgoing notification.

Security Patterns for AI Edge Functions

The most important security measure for any AI Edge Function is verifying the caller's identity before making any API call. Always extract the JWT from the `Authorization: Bearer` header, call `supabase.auth.getUser(token)` to verify it, and check that the user has the appropriate role or subscription level for the requested operation. An unauthenticated AI function is a direct path to unlimited API spend by anyone who discovers your endpoint URL.

Secrets management in Supabase Edge Functions uses the `supabase secrets set` CLI command and `Deno.env.get()` at runtime. Never hardcode API keys in function source code, even in a private repository, secrets belong in the secrets store, not in source control. Rotate secrets immediately if a repository is accidentally made public or if a key is logged in error traces.

Input validation prevents prompt injection attacks, where a malicious user crafts an input that changes the AI's behaviour. Validate input length, strip dangerous characters, and if the function uses a system prompt that includes user-supplied data, sanitise the user input before interpolation. A simple length check (`if (prompt.length > 2000) return error`) eliminates a class of abuse where users craft extremely long prompts to maximise compute cost.