AI & GenAI Projects

This section highlights my work across multi-tenant retrieval systems, evaluation infrastructure, and enterprise AI orchestration — with a focus on reliability, scalability, and measurable engineering impact.

Core AI engineer on a multi-tenant GenAI platform supporting federal enterprise use cases across document intelligence and conversational AI.

Designed the production retrieval and ingestion architecture using Azure OpenAI, Azure Document Intelligence, PostgreSQL, Qdrant, and S3, with asynchronous orchestration via AWS SQS and Step Functions.

Implemented strict tenant isolation across database queries, vector search filters, and conversation context to ensure client-safe retrieval.

Architected the chat execution flow with parallel RAG + Text2SQL processing, singleton-cached clients for throughput, and per-interaction cost tracking for operational visibility.

3s p95 Latency

$0.015 Per Question

Azure OpenAI Qdrant PostgreSQL S3 SQS Step Functions Multi-Tenant RAG

Designed and built a centralized evaluation platform that consolidated three separate evaluation systems into one YAML-configurable framework for multiple enterprise AI products.

Implemented LLM-as-Judge evaluation using Azure OpenAI across semantic correctness, completeness, faithfulness, relevancy, and clarity, alongside deterministic extraction metrics.

Integrated evaluation runs into CI/CD pipelines for automated regression checks, improving evaluation reliability and reducing new product onboarding from weeks to hours.

Built dashboard and storage workflows for evaluation tracking, comparison, and artifact management.

3 Products Unified

80%+ Test Coverage

3s p95 Latency

$0.015 Per Evaluation

Azure OpenAI FastAPI PostgreSQL Streamlit CI/CD YAML Evaluation

Built a multilingual knowledge system that ingests books across Marathi, Hindi, and English via Sarvam AI OCR, extracts structured concepts into a Neo4j knowledge graph, and combines graph traversal with Qdrant vector similarity search using 4,096-dimensional embeddings for citation-aware content generation.

Designed as a production-style system with FastAPI, PostgreSQL, Qdrant, Neo4j, async processing via Celery, and comprehensive integration testing.

71/71 Tests Passing

5,654 Concept Nodes

973 Indexed Chunks

FastAPI Neo4j 5 Qdrant React 19 Celery PostgreSQL MinIO OpenTelemetry JWT RS256

View on GitHub

Built an end-to-end LoRA fine-tuning pipeline for Llama2 (7B) and Llama3 using HuggingFace Transformers and PEFT. Delivered 20% domain-specific accuracy improvement with substantially reduced compute cost versus full fine-tuning. Includes configurable training datasets, domain evaluation scripts, and GGUF export for local inference.

20% Accuracy Improvement

Llama2/3 LoRA PEFT HuggingFace GGUF Cloud GPU

Built a document retrieval bot for Microsoft Teams using OpenAI and LangChain for context-aware question answering over enterprise documents. Implemented FastAPI backend with ChromaDB vector storage and deployed via Azure Bot Services.

OpenAI LangChain ChromaDB FastAPI Azure Bot Services MS Teams

Fine-tuned OpenAI models (Davinci-002, GPT-3.5 Turbo) for domain-specific conversational support. Built multi-turn conversation data preparation pipelines, evaluation harnesses using scikit-learn metrics, and deployed interactive Streamlit chat applications for inference.

GPT-3.5 Turbo Davinci-002 OpenAI Fine-Tuning API Streamlit scikit-learn

Architecture Diagrams

System-level design thinking behind the featured projects

Mila Chat — Multi-Tenant RAG Architecture

Multi-tenant enterprise retrieval architecture with tenant-scoped query execution, vector search, relational metadata, and async ingestion workflows.

Query Path

User / Client → FastAPI Chat Service → Tenant Isolation Layer

Parallel Execution

RAG Pipeline | Text2SQL Path

Data Layer

Qdrant PostgreSQL S3 Azure OpenAI

Ingestion Path

Documents → Azure Doc Intelligence → AWS SQS → Step Functions → Qdrant + S3

Unified AI Evaluation Platform — Evaluation Flow

Centralized LLM evaluation workflow with YAML-configured product adapters, judge-based metrics, deterministic metrics, CI/CD integration, and evaluation dashboards.

Trigger

CI/CD Pipeline → YAML Config → Product Adapter

Evaluation Engine

Evaluation Runner →

Metrics

Azure OpenAI Judge | Deterministic Metrics

Storage & Output

PostgreSQL S3 Artifacts → Streamlit Dashboard → Reports / Comparison

System Design Focus

Multi-Tenant AI Architecture Retrieval-Augmented Generation Vector Search & Filtering Async Ingestion Pipelines LLM Cost Optimization Evaluation Infrastructure

AI & GenAI Projects

Mila Chat — Multi-Tenant Enterprise RAG Platform

Unified AI Evaluation Platform — CI/CD Integrated LLM Evaluation System

Creator GraphRAG

LLM Fine-Tuning Pipeline

Intelligent Document Query Bot

Commercial LLM Fine-Tuning

Architecture Diagrams

Mila Chat — Multi-Tenant RAG Architecture

Unified AI Evaluation Platform — Evaluation Flow

System Design Focus