Why You Should Never Call OpenAI Directly from the Frontend
The first mistake most no-code builders make: calling the OpenAI API directly from WeWeb or FlutterFlow, with the API key exposed in the client.
Never do this. The API key will be visible in browser dev tools, and anyone who finds it can generate thousands of dollars in API calls at your expense. OpenAI API keys have no rate limiting by default.
The correct pattern: all OpenAI calls go through your backend (Supabase Edge Function, Xano endpoint, or any server). The frontend calls your backend, which calls OpenAI with the key stored as a server-side environment variable. This adds one layer of indirection and protects your key completely.
Setting Up Supabase Edge Functions for OpenAI
Supabase Edge Functions are Deno-based serverless functions that run on the edge. They're the easiest way to proxy OpenAI calls.
```
// supabase/functions/ai-chat/index.ts
import { serve } from "https://deno.land/std@0.168.0/http/server.ts"
serve(async (req) => {
const { messages, systemPrompt } = await req.json()
const response = await fetch("https://api.openai.com/v1/chat/completions", {
method: "POST",
headers: {
"Authorization": `Bearer ${Deno.env.get("OPENAI_API_KEY")}`,
"Content-Type": "application/json"
},
body: JSON.stringify({
model: "gpt-4o",
messages: [{ role: "system", content: systemPrompt }, ...messages],
max_tokens: 1000
})
})
return new Response(response.body, { headers: { "Content-Type": "application/json" } })
})
```
Deploy with `supabase functions deploy ai-chat`. Set the key: `supabase secrets set OPENAI_API_KEY=sk-...`.
OpenAI Cost Optimisation: Caching and Batching
In production, OpenAI API costs can grow quickly if not managed. The two most effective cost-reduction strategies before touching model selection are caching and batching. Caching means storing the output of an API call and returning the cached result when the same input is requested again. Create a table `ai_response_cache` in Supabase with columns for a hash of the full prompt, the response text, the model used, and a created_at timestamp. Before every API call, check if a cached response exists for that hash. Cache hit rates of 20–40% are common for content generation features where users often generate similar outputs.
Batching applies when you need to run many AI operations that are not user-facing in real time, for example, generating summaries for 500 documents uploaded overnight or running sentiment analysis on yesterday's support tickets. OpenAI's Batch API processes asynchronously at 50% of the standard price. Submit your requests as a JSONL file, poll for completion, and retrieve results. For background jobs, batching should always be your default.
A third technique: set aggressive `max_tokens` limits. If a feature only needs a 150-word output, setting `max_tokens: 200` ensures you never pay for 2,000-token responses caused by prompt drift. Monitor average token usage per endpoint for two weeks after launch and dial limits down to the 95th percentile of what you actually observe.
GPT-4o vs GPT-4 Turbo: Choosing the Right Model
Not every feature needs the most capable model. Choosing the right model for each use case can cut your API bill by 60–80% without users noticing a quality difference. GPT-4o is the current flagship, multimodal (images, audio, text), fast, and cost-effective at $2.50/M input tokens. It is the right choice for user-facing chat, complex reasoning, and tasks where output quality is directly visible to the user.
GPT-4o mini runs at $0.15/M input tokens, roughly 17× cheaper. It handles simple classification, short-form content generation, FAQ answering, and form pre-fill tasks with quality that most users will not distinguish from GPT-4o. For any feature where the task is well-defined and simple, start with GPT-4o mini and only upgrade to GPT-4o if quality testing shows a meaningful difference.
For embeddings and semantic search, use text-embedding-3-small ($0.02/M tokens). It is fast, cheap, and performs comparably to larger embedding models for most document retrieval tasks. Reserve text-embedding-3-large for search features where precision is critical and query volume is low. The rule of thumb: start cheap, benchmark quality, and only upgrade the model when you can measure the improvement in user outcomes.
Calling the Edge Function from WeWeb
In WeWeb, create a REST API data source pointing to your Edge Function URL (found in Supabase dashboard → Edge Functions). Configure it as a POST request with a JSON body.
Create a variable `chatMessages` (array) and `aiResponse` (string). On button click: 1. Append the user's message to `chatMessages` 2. Call the Edge Function action with `messages: chatMessages` 3. Bind `aiResponse` to the response body 4. Display `aiResponse` in a text element
For streaming responses (text appearing word by word), use the `stream: true` parameter in the OpenAI call and handle the SSE stream in the Edge Function, more complex but creates a much better UX.
Streaming Responses in No-Code Apps
Streaming is the technique that makes ChatGPT display text word-by-word as it is generated rather than waiting for the full response. It dramatically improves perceived performance, users see output in under one second instead of waiting 3–8 seconds for the full response to arrive. Implementing streaming in a no-code stack requires a small amount of custom code but is achievable with Supabase Edge Functions and WeWeb's custom JavaScript actions.
In the Edge Function, add `stream: true` to the OpenAI request and pipe the response body back to the client. The Edge Function returns a Server-Sent Events (SSE) stream, a series of text chunks, each prefixed with `data:`. In WeWeb, handle this with a custom JavaScript action that opens an `EventSource` connection, listens for data events, and appends each chunk to a reactive variable bound to your output text element.
In FlutterFlow, streaming is handled via a Custom Action written in Dart that uses the `http` package to read the response as a stream and update app state incrementally. The implementation takes 2–4 hours but is worth the investment for any chatbot or content generation feature, the UX improvement is significant and users consistently prefer it.
Rate Limiting and Retry Logic
OpenAI's API returns HTTP 429 (Too Many Requests) errors when you exceed your rate limits. Without retry logic, these surface as errors to your users. Proper rate limit handling is non-negotiable for production AI features.
In your Supabase Edge Function, implement exponential backoff: on a 429 response, wait 1 second and retry; if it fails again, wait 2 seconds; then 4 seconds; then fail with a user-friendly error message. Most 429 errors are transient and resolve within the first or second retry.
For user-level rate limiting, preventing any single user from consuming disproportionate API budget, implement a per-user request counter in Supabase. Each API call increments a counter in a `usage_tracking` table. Check the counter at the start of each request and return a 429 with a clear message if the user has exceeded their daily or monthly limit. This protects your API budget and provides the data you need for usage-based billing. Log model, tokens used, latency, and success/failure for every request, this data becomes essential when debugging cost spikes and optimising prompts.
Prompt Engineering for Product Features
The quality of your AI feature depends entirely on your system prompt. Generic prompts produce generic output.
**For a content generator**: "You are a copywriter for [Company Name], a [description] SaaS. Generate [output type] that matches this brand voice: [examples]. Always output in this format: [structure]. Never include [exclusions]."
**For a data analyst**: "You are an expert data analyst. The user will provide you with structured data. Analyse it and return insights as: 1) A one-sentence summary, 2) Three key findings as bullet points, 3) One recommended action. Always respond in valid JSON."
**Practical rules**: Be specific about output format (JSON when the frontend needs to parse it), include negative constraints ("never mention competitors"), and test with 20+ inputs before shipping.
Cost Control and Rate Limiting
OpenAI API costs add up fast in production. Three controls to implement before launch:
1. **Token limits**: Set `max_tokens` on every request. For most features, 500 tokens is enough. GPT-4o charges $2.50/M input tokens + $10/M output tokens, a 500-token limit keeps cost under $0.005 per request.
2. **User rate limiting**: In your Edge Function, check how many calls the user has made in the last hour (store in Supabase). Return 429 if over the limit.
3. **Caching**: For deterministic outputs (same input → same output), cache responses in Supabase. A `ai_cache` table with a hash of the prompt as the key eliminates redundant API calls.
For a SaaS app with 500 MAU each making 20 AI requests/day: budget €150–300/month for GPT-4o.
Common No-Code AI Features We Build
**AI content generation**: Blog posts, product descriptions, email subject lines. WeWeb form → Edge Function → GPT-4o → display output.
**Intelligent search**: Embed user content with OpenAI text-embedding-3-small, store vectors in pgvector (Supabase), and run similarity search. Returns semantically relevant results instead of keyword matches.
**Document summarisation**: Upload PDF → extract text via Edge Function → summarise with GPT-4o → store summary in Supabase.
**AI-powered onboarding**: Ask 5 questions during signup, generate a personalised setup checklist with GPT-4o, store in user profile.
All of these run on the WeWeb + Supabase + OpenAI stack with zero custom code in the frontend.