February 19, 2026 · 10 min
Lessons in Building Scalable AI SaaS Platforms
Key architectural and product lessons from building AutomateLanka — an all-in-one automation SaaS platform built on n8n workflow foundations with multi-tenant isolation and AI search.
Lessons in Building Scalable AI SaaS Platforms
TL;DR: Building AutomateLanka — a full automation SaaS platform similar to n8n — taught me more about AI SaaS architecture in 6 months than I could have learned from any course. Here are the lessons that matter most.
What "Scalable AI SaaS" Actually Means
Before diving in, let's define the term. A scalable AI SaaS platform needs to:
- Execute AI workflows reliably at volume, not just one-at-a-time
- Isolate tenant data and compute — one customer's heavy workflow shouldn't starve others
- Handle failure gracefully — AI calls fail; workflow execution must recover
- Stay cost-efficient — AI inference is expensive; you need smart optimization
- Operate transparently — customers need visibility into what's running
These requirements drive completely different architecture decisions than a standard CRUD SaaS.
Lesson 1: Queue Everything That Runs AI
The biggest mistake in AI SaaS: calling AI APIs directly from HTTP request handlers.
// ❌ WRONG: Direct AI api call in request handler
router.post('/workflows/:id/run', async (req, res) => {
const result = await runWorkflow(workflowId); // Could take 10+ seconds
return res.json(result); // HTTP timeout if AI is slow
});
// ✅ RIGHT: Queue the work, return immediately
router.post('/workflows/:id/run', async (req, res) => {
const job = await workflowQueue.add('execute', {
workflowId,
tenantId: req.tenant.id,
triggeredBy: req.user.id,
});
return res.json({
jobId: job.id,
status: 'queued',
trackingUrl: `/api/jobs/${job.id}/status`
});
});
I use BullMQ on top of Redis for workflow execution queuing:
// Queue setup
import { Queue, Worker } from 'bullmq';
export const workflowQueue = new Queue('workflows', {
connection: redisConnection,
defaultJobOptions: {
attempts: 3,
backoff: { type: 'exponential', delay: 5000 },
removeOnComplete: { count: 1000 },
removeOnFail: { count: 5000 },
},
});
// Worker: separate process, scales independently
const worker = new Worker('workflows', async (job) => {
const { workflowId, tenantId } = job.data;
return await executeWorkflow(workflowId, tenantId, {
onProgress: (progress) => job.updateProgress(progress),
});
}, {
connection: redisConnection,
concurrency: 10, // Process 10 workflows simultaneously per worker
limiter: {
max: 100, // Max 100 jobs per
duration: 60000 // 60 seconds (tenant rate limiting handled separately)
}
});
Lesson 2: Tenant-Level Rate Limiting from Day One
Without per-tenant rate limiting, one power user can exhaust your AI API budget and degrade service for everyone.
// Tenant-aware rate limiter with BullMQ
async function addToTenantQueue(
tenantId: string,
workflowId: string,
priority: number
) {
// Rate limit: max 10 concurrent executions per tenant
const tenantConcurrency = await getTenantConcurrency(tenantId);
if (tenantConcurrency >= TENANT_LIMITS[tenant.plan].maxConcurrent) {
throw new PlanLimitError('Maximum concurrent workflows reached');
}
return workflowQueue.add('execute', { workflowId, tenantId }, {
priority, // Enterprise plans get higher priority
jobId: `${tenantId}-${workflowId}-${Date.now()}`,
});
}
// Plan limits
const TENANT_LIMITS = {
FREE: { maxConcurrent: 1, monthlyRuns: 100, aiTokens: 50_000 },
PRO: { maxConcurrent: 5, monthlyRuns: 5_000, aiTokens: 1_000_000 },
ENTERPRISE: { maxConcurrent: 20, monthlyRuns: -1, aiTokens: -1 },
};
Lesson 3: Design for Partial Failure
AI workflows have many failure points. Each step can fail independently:
Node 1 (HTTP Request) → ✅ Success
Node 2 (AI Enrichment) → ❌ OpenAI Rate Limit
Node 3 (Database Write) → Never executed
Node 4 (Notification) → Never executed
The system must:
- Record exactly which step failed and why
- Allow resuming from the failure point after fixing the issue
- Not double-execute successfully completed steps
// Checkpoint-based execution
async function executeWorkflow(workflowId: string, tenantId: string) {
const workflow = await getWorkflow(workflowId);
const execution = await createExecution(workflowId);
for (const node of workflow.nodes) {
// Skip already-completed nodes (supports resume)
const checkpoint = await getCheckpoint(execution.id, node.id);
if (checkpoint?.status === 'completed') {
previousOutput = checkpoint.output;
continue;
}
try {
const output = await executeNode(node, previousOutput, tenantId);
// Save checkpoint immediately after success
await saveCheckpoint(execution.id, node.id, 'completed', output);
previousOutput = output;
} catch (error) {
await saveCheckpoint(execution.id, node.id, 'failed', null, error.message);
await updateExecution(execution.id, 'failed', { failedAt: node.id, error: error.message });
throw error; // Let BullMQ handle retry
}
}
await updateExecution(execution.id, 'completed');
}
Lesson 4: AI Token Usage is Your COGS — Track It Religiously
Unlike compute costs that scale predictably, AI token costs are chaotic. One poorly controlled workflow can incur $50 in API costs overnight.
// Token usage tracking middleware
async function trackableOpenAICall(
tenantId: string,
params: OpenAI.ChatCompletionCreateParams
): Promise<OpenAI.ChatCompletion> {
// Pre-flight: check tenant has remaining token budget
const remaining = await getRemainingTokenBudget(tenantId);
const estimated = estimateTokens(params);
if (estimated > remaining) {
throw new PlanLimitError('AI token budget exhausted for this billing period');
}
const start = Date.now();
const response = await openai.chat.completions.create(params);
const latency = Date.now() - start;
// Track actual usage
const { prompt_tokens, completion_tokens, total_tokens } = response.usage!;
await prisma.aiUsageLog.create({
data: {
tenantId,
model: params.model,
promptTokens: prompt_tokens,
completionTokens: completion_tokens,
totalTokens: total_tokens,
estimatedCost: calculateCost(params.model, total_tokens),
latencyMs: latency,
timestamp: new Date(),
}
});
// Update tenant running total
await incrementTokenUsage(tenantId, total_tokens);
return response;
}
Lesson 5: Semantic Search Changes How Users Interact
AutomateLanka has 200+ built-in workflow templates. With keyword search, users struggle to find what they need. Semantic search made this dramatically better:
User types: "when a customer pays, send them a welcome email and update spreadsheet"
Keyword search: 0 results matching all keywords
Semantic search: Finds "Payment Confirmation → CRM Update → Email Notification" template
The semantic search index refreshes incrementally as new templates are added:
// Index workflow templates for semantic search (Xenova transformers — runs in-browser/Node.js, no API needed)
import { pipeline } from '@xenova/transformers';
const embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
async function indexWorkflowTemplate(template: WorkflowTemplate) {
const text = `${template.name} ${template.description} ${template.tags.join(' ')}
Use cases: ${template.useCases.join('. ')}`;
const output = await embedder(text, { pooling: 'mean', normalize: true });
const embedding = Array.from(output.data);
await prisma.$executeRaw`
UPDATE workflow_templates
SET embedding = ${embedding}::vector
WHERE id = ${template.id}
`;
}
Important: I used Xenova (a JS/Node.js port of HuggingFace transformers) to run embeddings locally — zero cost, zero latency, no API key needed. For a SaaS platform serving millions of embed calls, this saves significant money.
Lesson 6: Observability is Non-Negotiable
Without good observability, debugging production issues in AI workflows is a nightmare.
Everything goes to a structured log:
// Structured log for every workflow step
await logger.info('workflow.node.executed', {
tenantId,
workflowId,
executionId,
nodeId,
nodeType,
durationMs,
tokenUsage: { prompt, completion, total },
inputSize: JSON.stringify(input).length,
outputSize: JSON.stringify(output).length,
success: true,
});
await logger.error('workflow.node.failed', {
tenantId,
workflowId,
nodeId,
error: error.message,
errorCode: error.code,
willRetry: attempt < maxAttempts,
attempt,
});
The tenant-facing execution dashboard shows exactly which nodes ran, how long each took, and what failed — giving customers visibility and reducing support tickets dramatically.
Lesson 7: Build the Admin Panel Early
As a multi-tenant platform, you need operational tooling for yourself:
- View all tenant executions and search by error
- Manually retry failed jobs
- Adjust plan limits without code changes
- View AI cost breakdown by tenant
- Kill runaway workflows
I spent two weeks on the admin panel early and recovered those two weeks within a month through faster debugging.
The Architecture That Emerged
Next.js Frontend (Vercel Edge)
│
REST + WebSocket
│
Node.js API Server (Railway)
│ │
│ └── BullMQ + Redis (Upstash)
│ │
│ Worker Processes
│ (AI execution)
│
PostgreSQL + pgvector (Supabase)
Critically: the API server and the workers are separate processes. Workers scale independently based on queue depth — you don't need to scale the API server when workflow volume peaks.
Summary: The Key Decisions That Mattered
| Decision | Why It Mattered | |----------|----------------| | Queue-first execution | Eliminated timeouts, enabled retry logic | | Per-tenant rate limiting | Prevented cost overruns and abuse | | Checkpoint-based execution | Enabled resume and prevented double-execution | | Token usage tracking | Made costs visible and controllable | | Local embedding model | Zero API cost for semantic search | | Workers as separate processes | Independent scaling, better resource control |