The Architecture
A production AI chatbot has four components:
1. **Chat UI**: Input field, message history display, loading state, error handling 2. **Conversation state**: An array of messages (role + content) stored in frontend state 3. **Backend proxy**: A Supabase Edge Function or Xano endpoint that calls OpenAI 4. **System prompt**: The instruction set that defines your chatbot's persona, knowledge, and constraints
The conversation state is the most important concept. OpenAI's API is stateless, every request must include the full conversation history. Your frontend maintains this history and sends it with each message.
Chatbot Architecture Choices: Retrieval vs Generation
Before building, decide whether your chatbot is primarily retrieval-based or generation-based. This choice determines the architecture, cost, and quality characteristics of the finished product.
A retrieval-based chatbot finds the best matching answer from a pre-defined knowledge base. The user asks a question, your system finds the most similar Q&A in the database, and presents it, optionally reformatted by the LLM. This is fast, cheap (few tokens consumed), and highly accurate within the knowledge domain. It fails when users ask questions not covered by the knowledge base, producing unhelpful "I don't have information on that" responses.
A generation-based chatbot sends the user's question to an LLM with context and lets the model compose a novel answer. This handles questions that were never explicitly written, can synthesise across multiple sources, and produces natural conversational responses. The cost is higher (more tokens per message), and the model can occasionally generate plausible-sounding but incorrect information. For most customer-facing support chatbots, the right architecture is hybrid: start with retrieval (pull the top 3 most relevant knowledge base articles) and use generation to compose a coherent answer from those sources, with a hard constraint that the model should not answer from outside those sources. This is RAG, Retrieval Augmented Generation, and it gives you accuracy with flexibility.
Building the Chat UI in WeWeb
In WeWeb, create a page-level variable `messages` (array, default empty). Add two components:
**Message list**: A Repeating Group bound to `messages`. Each item has a conditional style, user messages right-aligned with a primary colour background, assistant messages left-aligned with a neutral background. Bind the text to `item.content`.
**Input area**: A text input bound to a `userInput` variable, plus a "Send" button. On button click: (1) append `{role: "user", content: userInput}` to `messages`, (2) clear `userInput`, (3) call the API action, (4) append the response as `{role: "assistant", content: response}`.
Add a loading spinner that shows while the API call is in progress.
Training the Chatbot on Your Documentation
A chatbot that only knows what GPT-4o was trained on will not know your product's specific features, pricing, policies, or procedures. To make the chatbot genuinely useful for your users, you need to inject your knowledge base into the conversation. There are two approaches: static injection (paste documentation directly into the system prompt) and dynamic injection via RAG.
Static injection works for small knowledge bases under 5,000 words. Write your documentation as structured text, add it to the system prompt, and the model uses it as its primary reference. Disadvantage: the static approach is expensive at scale (sending the entire knowledge base with every message), and keeping the system prompt updated as documentation changes is manual work.
For larger knowledge bases, RAG is the right approach. Process your documentation into chunks of 200–500 words, embed each chunk using OpenAI's text-embedding-3-small model, and store the embeddings in Supabase with the pgvector extension. At query time, embed the user's message, find the top 3–5 most similar chunks via cosine similarity search, and inject only those chunks into the system prompt. This costs dramatically less per message and scales to thousands of documentation pages. When you update documentation, re-embed the changed chunks and update the database, the chatbot picks up the changes on the next query automatically.
The Supabase Edge Function
Your Edge Function receives the message array and system prompt, calls OpenAI, and returns the response:
```typescript
serve(async (req) => {
const { messages, systemPrompt } = await req.json()
const completion = await openai.chat.completions.create({
model: "gpt-4o",
messages: [
{ role: "system", content: systemPrompt },
...messages
],
max_tokens: 500,
temperature: 0.7
})
return new Response(
JSON.stringify({ content: completion.choices[0].message.content }),
{ headers: { "Content-Type": "application/json" } }
)
})
```
The `systemPrompt` can be passed from the frontend (useful for multi-persona apps) or hardcoded in the function (more secure).
Writing an Effective System Prompt
The system prompt determines everything about your chatbot's behaviour. A good production system prompt includes:
- **Role**: "You are a customer support agent for Acme SaaS, a project management tool." - **Knowledge**: "You help users with: creating projects, inviting team members, setting up integrations, and billing questions." - **Constraints**: "Only answer questions about Acme SaaS. For unrelated questions, politely redirect. Never discuss competitor products. Never make up features that don't exist." - **Format**: "Keep responses under 100 words. Use bullet points for steps. Always end support answers with: 'Let me know if this helps!'" - **Escalation**: "If the user expresses frustration or mentions a billing error, say: 'I'll connect you with our team' and trigger the escalation flow."
Handling Escalation to Human Support
Every production chatbot needs a clear escalation path for questions the AI cannot handle confidently, emotionally charged conversations, and situations that require account-level actions a bot should not take (issuing refunds, deleting accounts, making billing exceptions). Design this into the system prompt and the UI from the beginning, not as an afterthought.
In the system prompt, define escalation triggers explicitly: "If the user mentions a billing dispute, payment failure, or account suspension, do not attempt to resolve it. Instead, respond exactly with: ESCALATE: [brief reason], and nothing else." Your Edge Function detects the ESCALATE prefix and creates a support ticket in your helpdesk (Intercom, Zendesk, or even a Supabase table) rather than displaying the message to the user.
In the UI, when an escalation is detected, replace the chatbot interface with a message: "I'm connecting you with our support team. They'll follow up within [SLA]. You can also email support@yourcompany.com." Email the support team with the full conversation transcript. This approach keeps the handoff smooth for the user while giving the support agent full context. Measure escalation rate as a core chatbot metric, a rate above 20% suggests the knowledge base needs expansion.
Adding Persistent Context
The basic chatbot forgets everything when the page refreshes. To make it smarter:
**User context injection**: When the chatbot session starts, fetch the user's account data (plan, usage, recent activity) and append it to the system prompt: "The user's current plan is Pro. Their last activity was 3 days ago. They have 2 active projects."
**Conversation persistence**: Save messages to a Supabase table (chatbot_sessions) with user_id and session_id. On page load, fetch the last N messages and pre-populate the messages array.
**Knowledge base**: For product documentation, store articles in Supabase with embeddings (using pgvector). Before calling GPT-4o, run a similarity search and inject the most relevant articles into the system prompt. This is called RAG (Retrieval Augmented Generation) and dramatically improves answer accuracy.
Measuring Chatbot Success
A chatbot without measurement is a feature with no feedback loop. The metrics that matter most for a support or product chatbot: resolution rate (percentage of conversations where the user did not escalate or submit a separate support ticket), session length (average messages per conversation, too short suggests the bot is failing early, too long suggests it is not resolving efficiently), and post-chat rating (a simple thumbs up/down after each conversation, stored in Supabase and reviewable in a dashboard).
Beyond these user-facing metrics, instrument your Edge Function to log: which knowledge base articles were retrieved most often (tells you what users ask about most), which queries had no strong match in the vector database (tells you knowledge base gaps), and model latency and token counts per session (tells you your cost per conversation). Review these metrics weekly for the first month after launch.
The most actionable metric is the list of queries with no knowledge base match. Export it weekly, write answers for the top 20 unanswered queries, add them to the knowledge base, and re-embed. A chatbot that improves its resolution rate by 5% per week for the first month will often reach 80%+ resolution within 6 weeks of launch, significantly better than most human-first support workflows at early-stage SaaS companies.
Costs and Performance in Production
For a SaaS with 500 active users each sending 10 messages/day:
- Average message: 50 tokens input + 100 tokens output - GPT-4o pricing: $2.50/M input + $10/M output - Daily cost: 500 × 10 × 150 tokens = 750,000 tokens = ~$2.50/day = ~$75/month
To manage costs: implement a session token budget (stop adding history messages when the conversation exceeds 2,000 tokens, start summarising old messages). Use GPT-4o mini for simple queries ($0.15/M input) and reserve GPT-4o for complex ones.
Response time: GPT-4o returns in 1–3 seconds. Add a typing indicator to set expectations. For sub-second UX, implement streaming.