Skip to content
Techliphant TechnologiesTechliphant Technologies
AI

Building a Modular AI Platform: What We Learned Designing Soliq

Soliq is our internal AI platform that powers every other Techliphant product. Here is what we learned about modularity, governance and model-agnosticism.

TE
Techliphant Engineering
April 10, 2026 11 min
AI
AI

We started Soliq because we kept rebuilding the same five layers on every AI engagement: a model gateway, a retrieval system, an agent runtime, an eval harness and a governance layer.

The first three times we rebuilt them, we called it "project-specific implementation." The fourth time, we admitted we had a platform problem.

Here is what we got right, what we got wrong, and how Soliq ended up designed.

Principle 1: Model-agnostic from day one

The first version of what became Soliq was tightly coupled to a single model provider. We chose it because it was the best at the tasks we needed at the time. Three months later, a competitor released something materially better at one task at a third of the cost. We had a refactoring project instead of a configuration change.

Soliq's model gateway treats every foundation model as a pluggable backend. Each is registered with a capability profile what it excels at, its context window, pricing per token, latency p95, data-residency options and the router selects per call based on task requirements and cost ceiling.

soliq/gateway/router.ts
import { modelRegistry, type RoutingContext } from '@soliq/gateway'; export async function route(ctx: RoutingContext): Promise<ModelBackend> {  const candidates = modelRegistry.filter((m) => {    if (ctx.requiresDataResidency && !m.residencyOptions.includes(ctx.region)) return false;    if (ctx.maxLatencyMs && m.p95LatencyMs > ctx.maxLatencyMs) return false;    if (ctx.task === 'code-generation' && !m.capabilities.includes('code')) return false;    return true;  });   // Sort by quality score for the task, then cost  return candidates    .sort((a, b) => b.qualityScore[ctx.task] - a.qualityScore[ctx.task] || a.costPerMToken - b.costPerMToken)    [0];}

The practical consequence: when a better model ships, we run it through the eval set, update the capability profile, and promote. Applications on top do not change.

Principle 2: Retrieval is a platform concern, not an app concern

Every application that needed RAG was building its own retrieval stack: choosing a vector database, implementing chunking, wiring up hybrid search, managing index freshness. Choices made under deadline pressure were inconsistent, and quality was inconsistent.

Retrieval is now a first-class platform service. Applications register a corpus and call a single retrieval API:

TypeScript
// application code retrieval is one callconst results = await soliq.retrieve({  corpusId: 'support-knowledge-base',  query: userMessage,  topK: 8,  mode: 'hybrid',        // BM25 + vector + recency  filter: { tenantId },  // row-level access control  citationsRequired: true,});

Under the hood, the platform handles chunking strategy, embedding model selection, index maintenance, hybrid ranking and citation tracking. Applications get consistent retrieval quality without knowing whether the index is backed by pgvector, OpenSearch or a future implementation.

The insight that took us longest to act on: query expansion matters as much as indexing. A user asking "where is my package" and a user asking "shipment tracking" mean the same thing. We now expand every query through a lightweight model pass before retrieval generating two or three semantic variants and re-rank across all of them.

Principle 3: Agents need rails before they need features

The first agent framework we shipped had rich features: tool calling, memory, multi-step planning, sub-agents. What it did not have was uniform idempotency, a dry-run mode, or a structured handoff contract.

The first production incident was a refund tool called twice on the same conversation because the model retried mid-stream. The second was a nightly eval run that triggered real external API calls. Both were preventable with two decorators.

Every tool in Soliq's registry now must declare:

TypeScript
const tool = defineTool({  name: 'create_refund',  description: 'Issue a refund for a completed order.',  idempotent: true,          // safe to retry; key derived from input hash  dryRunCapable: true,       // evals call this without side effects  requiresApproval: (args) => args.amount > 500,  // human-in-the-loop above threshold  timeout: 8000,  schema: z.object({    orderId: z.string(),    amount: z.number().positive(),    reason: z.enum(['defective', 'not-as-described', 'changed-mind', 'not-received']),  }),});

idempotent: true forces the runtime to generate and track an idempotency key on every invocation. dryRunCapable: true means evals get the full agent loop without real side effects. requiresApproval pauses execution and surfaces a structured approval request before high-stakes actions proceed.

These are boring. They are also load-bearing.

Principle 4: Evals are the deployment gate, not a quality suggestion

The original stance was that evals should be "strongly encouraged." Production taught us that "strongly encouraged" means "skipped under deadline pressure" and "discovered as an incident."

Soliq enforces an eval registration requirement at the deployment layer. A feature cannot be promoted to production without a registered eval set that has passed at a defined threshold. The check runs in CI not as a warning, as a hard block.

.soliq/feature.yaml
feature: order-status-agentevalSet: order-status.v1passThreshold: 0.91graders:  - intent-classification  - data-accuracy           # did the agent return the right order data?  - no-hallucinated-status  # did it invent a delivery date?  - appropriate-escalationblockingGrades:  - no-hallucinated-status  # any failure here blocks, regardless of overall score

blockingGrades is the important addition. Some failure categories are not averaged a single hallucinated order status is a block regardless of how well the agent performed on everything else.

Evals run in three cadences: a fast smoke set per pull request (~50 cases, under 3 minutes), a nightly set on sampled real traffic (~500 cases with PII redaction), and a weekly adversarial set targeting prompt injection, jailbreak patterns and known failure modes.

Principle 5: Governance cannot be retrofitted

We tried retrofitting governance onto a deployed system once. The client needed PII redaction added to audit logs after a compliance finding. The redactor had to be wired into 14 different logging paths across three services. It took six weeks and introduced two regressions.

Soliq's governance layer is not a feature it is the pipe that everything flows through. PII redaction runs on every log entry before it reaches storage. Prompt versioning is automatic every prompt is hashed and stored, so you can replay any conversation exactly as it was generated. Cost ceilings are enforced at the gateway level and cannot be bypassed by application code. Data-residency constraints are checked at corpus registration time, not at query time.

The practical effect: when an enterprise client comes to us with a "we need audit-ready logs" requirement, the answer is "already done" not "here is a six-week workstream."


What we would do differently

The main thing we underinvested in early: observability tooling for non-engineers. The platform had rich logs and metrics, but reading them required knowing which queue to tail and which dashboard panel to look at. Product managers and customer success teams the people who care most about whether the AI is performing could not answer "is quality up or down this week?" without engineering support.

We have been rebuilding the observability layer around that audience specifically: a quality dashboard that shows eval trends, intent distribution, handoff rates and cost per conversation, updated daily, readable without a data engineering background.

The model is not the product. The system you build around it is. Get the plumbing right and the model upgrades become free.

AIPlatformArchitecture
All posts

Ready when you are

Let's build something exceptional.

Tell us about your business, your stack, and the problem you are trying to solve. We respond with a clear next step usually a 30-minute discovery call, no fluff.

Building a Modular AI Platform: What We Learned Designing Soliq · Techliphant