Why Edge Functions for AI

Three reasons Edge Functions are the right choice for AI API calls:

1. **Security**: Your OpenAI/Anthropic API key lives as a server-side secret, never exposed to the client. Anyone with your frontend code can't extract the key.

2. **Database access**: Edge Functions run inside Supabase's infrastructure and have direct, low-latency access to your PostgreSQL database. You can fetch user context, store results, and log usage in the same function that calls the AI.

3. **Streaming support**: Edge Functions support Response streaming, which lets you send AI output to the client word-by-word — dramatically improving perceived performance for long AI responses.

Basic OpenAI Proxy Function

The simplest Edge Function: receive a prompt, call OpenAI, return the response.

```typescript
import OpenAI from "npm:openai"

const openai = new OpenAI({ apiKey: Deno.env.get("OPENAI_API_KEY") })

Deno.serve(async (req) => {
  const { prompt } = await req.json()
  
  const chat = await openai.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: prompt }],
    max_tokens: 500
  })
  
  return new Response(
    JSON.stringify({ result: chat.choices[0].message.content }),
    { headers: { "Content-Type": "application/json" } }
  )
})
```

Deploy: `supabase functions deploy openai-proxy`. Secrets: `supabase secrets set OPENAI_API_KEY=sk-...`.

Adding Authentication and Rate Limiting

Every AI Edge Function should verify the user is authenticated and check their usage limits:

```typescript
import { createClient } from "npm:@supabase/supabase-js"
Deno.serve(async (req) => {
  // Verify auth token
  const token = req.headers.get("Authorization")?.replace("Bearer ", "")
  const supabase = createClient(Deno.env.get("SUPABASE_URL"), Deno.env.get("SUPABASE_SERVICE_ROLE_KEY"))
  const { data: { user }, error } = await supabase.auth.getUser(token)
  if (error || !user) return new Response("Unauthorized", { status: 401 })
  
  // Check rate limit (max 20 requests/hour)
  const oneHourAgo = new Date(Date.now() - 3600000).toISOString()
  const { count } = await supabase
    .from("ai_usage_log")
    .select("*", { count: "exact" })
    .eq("user_id", user.id)
    .gte("created_at", oneHourAgo)
  
  if (count >= 20) return new Response("Rate limit exceeded", { status: 429 })
  
  // ... call OpenAI and log usage
})
```

Streaming Responses to WeWeb

Streaming sends AI output to the client progressively — users see text appearing word by word instead of waiting for the full response.

In the Edge Function:
```typescript
const stream = await openai.chat.completions.create({
  model: "gpt-4o",
  messages,
  stream: true
})

const readable = new ReadableStream({ async start(controller) { for await (const chunk of stream) { const text = chunk.choices[0]?.delta?.content || "" controller.enqueue(new TextEncoder().encode(text)) } controller.close() } })

return new Response(readable, {
  headers: { "Content-Type": "text/event-stream" }
})
```

In WeWeb: use a custom JavaScript action to fetch the stream URL and update a page variable character by character as chunks arrive.

Building a RAG Pipeline (Retrieval Augmented Generation)

RAG improves AI answers by injecting relevant knowledge into the prompt at query time. Architecture:

1. **Knowledge ingestion** (run once): For each document in your knowledge base, call OpenAI's embedding API to get a 1536-dimensional vector. Store vectors in Supabase using the pgvector extension.

2. **Query time**: When a user asks a question, embed the question (same embedding API), then run a similarity search in Supabase: `SELECT content, 1 - (embedding <=> query_embedding) AS similarity FROM documents ORDER BY similarity DESC LIMIT 3`.

3. **Augmented prompt**: Inject the top 3 matching documents into the system prompt: "Answer using only the following context: [docs]. If the answer isn't in the context, say you don't know."

Result: the AI answers only from your documentation, with zero hallucination about things you haven't documented.