This section highlights my work across multi-tenant retrieval systems, evaluation infrastructure, and enterprise AI orchestration — with a focus on reliability, scalability, and measurable engineering impact.
Mila Chat — Multi-Tenant Enterprise RAG Platform
TechnoMile | Production System
Core AI engineer on a multi-tenant GenAI platform supporting federal enterprise use cases across document intelligence and conversational AI.
Designed the production retrieval and ingestion architecture using Azure OpenAI, Azure Document Intelligence, PostgreSQL, Qdrant, and S3, with asynchronous orchestration via AWS SQS and Step Functions.
Implemented strict tenant isolation across database queries, vector search filters, and conversation context to ensure client-safe retrieval.
Architected the chat execution flow with parallel RAG + Text2SQL processing, singleton-cached clients for throughput, and per-interaction cost tracking for operational visibility.
Unified AI Evaluation Platform — CI/CD Integrated LLM Evaluation System
TechnoMile | Sole Architect
Designed and built a centralized evaluation platform that consolidated three separate evaluation systems into one YAML-configurable framework for multiple enterprise AI products.
Implemented LLM-as-Judge evaluation using Azure OpenAI across semantic correctness, completeness, faithfulness, relevancy, and clarity, alongside deterministic extraction metrics.
Integrated evaluation runs into CI/CD pipelines for automated regression checks, improving evaluation reliability and reducing new product onboarding from weeks to hours.
Built dashboard and storage workflows for evaluation tracking, comparison, and artifact management.
Creator GraphRAG
Multilingual Knowledge + Vector Retrieval System
Built a multilingual knowledge system that ingests books across Marathi, Hindi, and English via Sarvam AI OCR, extracts structured concepts into a Neo4j knowledge graph, and combines graph traversal with Qdrant vector similarity search using 4,096-dimensional embeddings for citation-aware content generation.
Designed as a production-style system with FastAPI, PostgreSQL, Qdrant, Neo4j, async processing via Celery, and comprehensive integration testing.
LLM Fine-Tuning Pipeline
LoRA/PEFT for Llama2 & Llama3
Built an end-to-end LoRA fine-tuning pipeline for Llama2 (7B) and Llama3 using HuggingFace Transformers and PEFT. Delivered 20% domain-specific accuracy improvement with substantially reduced compute cost versus full fine-tuning. Includes configurable training datasets, domain evaluation scripts, and GGUF export for local inference.
Intelligent Document Query Bot
Microsoft Teams RAG Bot
Built a document retrieval bot for Microsoft Teams using OpenAI and LangChain for context-aware question answering over enterprise documents. Implemented FastAPI backend with ChromaDB vector storage and deployed via Azure Bot Services.
Commercial LLM Fine-Tuning
OpenAI Davinci & GPT-3.5 Turbo
Fine-tuned OpenAI models (Davinci-002, GPT-3.5 Turbo) for domain-specific conversational support. Built multi-turn conversation data preparation pipelines, evaluation harnesses using scikit-learn metrics, and deployed interactive Streamlit chat applications for inference.
Architecture Diagrams
System-level design thinking behind the featured projects
Mila Chat — Multi-Tenant RAG Architecture
Multi-tenant enterprise retrieval architecture with tenant-scoped query execution, vector search, relational metadata, and async ingestion workflows.
Query Path
Parallel Execution
Data Layer
Ingestion Path
Unified AI Evaluation Platform — Evaluation Flow
Centralized LLM evaluation workflow with YAML-configured product adapters, judge-based metrics, deterministic metrics, CI/CD integration, and evaluation dashboards.
Trigger
Evaluation Engine
Metrics
Storage & Output