Dynamic Senior AI Full-Stack Engineer with a proven track record at Turing, specializing in Python and GCP. Expertly architected a distributed LLM evaluation platform, enhancing task completion reliability by 18%. Adept at leading cross-functional teams and optimizing complex systems, driving innovation and efficiency in AI solutions.
Overview
7
7
years of professional experience
1
1
Certification
Work History
Senior AI Full-Stack Engineer
Turing
07.2025 - Current
Architected and deployed a distributed LLM evaluation platform on GCP, orchestrating 200+ concurrent VM workloads via GCP Batch to benchmark frontier AI agents against complex desktop automation tasks at scale.
Owned the backend evaluation stack end-to-end FastAPI services, PostgreSQL data layer, and Dockerized execution containers serving 10K+ daily evaluation runs with 99.9% task completion reliability.
Diagnosed and eliminated a systemic scoring contamination bug spanning 4 microservices, recovering ~18% of previously misclassified benchmark results and restoring leaderboard integrity.
Designed a real-time streaming architecture Pub/Sub listeners, Redis queues, and WebSocket fan-out to surface live agent steps and scoring events to annotators during refinement, cutting evaluation feedback loops from minutes to under one second.
Led a full architectural migration from REST polling to GraphQL subscriptions, reducing API payload size by ~40% and enabling real-time data synchronization across the entire frontend.
Shipped a fully automated CI/CD pipeline via GitHub Actions and GCP, eliminating all manual deployment steps and cutting release cycle time by 60%.
Led cross-functional optimization with the infrastructure team, re-architecting VM provisioning and job scheduling to reduce per-evaluation compute cost by 25% and increase cluster throughput by 3x.
AI Principal Backend Calibrator
Amazon BigCode Project
06.2024 - 07.2025
Developed the backend and evaluation pipelines using Django/DRF and Celery, supporting both offline and online A/B testing with monitored win rates.
Managed PostgreSQL data models and integrated ClickHouse for rapid validation of large datasets (class balance, drift, duplicates).
Implemented a CI quality gate enforcing full test coverage and blocking merges until all pipelines passed; ensured zero unnoticed failures.
Collaborated with a team of 3 senior and 15 engineers to deliver labeling pipelines with IRR and QA gates maintained on schedule.
Stabilized production under heavy load by fixing queue starvation, schema drift, and idempotent issues; introduced rate limiting and tracing to sustain p95 latency and error thresholds.
Conducted large-scale LLM calibration (hard negatives, preference tuning, multilingual evaluation) and merged RLHF data into a unified schema, increasing benchmark win rate by 7-12%.
AI Team Lead Engineer
Microsoft Project
04.2025 - 06.2025
Built the taxonomy-to-dataset pipeline with validation layers and versioned exports, splitting ~120K items for training and evaluation.
Designed and maintained L1/L2/L3 metadata taxonomies, integrating automated validators to ensure data consistency across iterations.
Conducted daily reviews in the labeling tool, enforced IRR and annotation guidelines, and fed edge-case corrections back into taxonomy rules.
Collaborated with 8 senior engineers and coached 7+ trainers, delivering over 200 curated datasets end to end, fully on schedule.
Operated LLM calibration pipelines with PostgreSQL and ClickHouse validation, improving online win rate and task completion metrics on targeted evaluation flows.
AI Full-Stack Engineer & LLM Trainer
Apple & ServiceNow projects
06.2024 - 04.2025
Apple (Python): Developed training and evaluation pipelines for algorithmic data curation and hard negative probing, completing 400+ training cycles with measurable gains in task completion.
ServiceNow (JavaScript + Function Calling): Built LLM-driven UI flows in Agent Workspace, UI Builder, and Now Experience, ensuring safe parallelism, idempotent, structured error taxonomy, and full audit logging.
Improved platform performance through GlideQuery tuning, caching strategies, and async worker orchestration.
Created reusable UI patterns and a lightweight evaluation harness, enhancing development speed and testing efficiency across multiple modules.
Senior Software Engineer
Mindrift
04.2023 - 04.2025
Rebuilt the labeling tool with a Django/DRF backend, Flask microservices, and PostgreSQL + Redis for scalable evaluations and rubric-driven workflows.
Integrated LangChain and LangGraph with function calling to enable multi-tool flows (menu lookup, order tracking, refunds, delivery ETA).
Deployed both local and hosted LLMs with custom guardrails and prompt validation, achieving ~99% guideline conformance on final evaluations.
Designed training and evaluation pipelines with Celery schedules, offline/live checks, and win-rate dashboards; improved system latency through async I/O, caching, and query optimization.
Delivered robust APIs and admin UIs supporting idempotent writes, real-time metrics, and alerting, ensuring reliability in production environments.
Full-Stack Developer
Freelance
12.2018 - 12.2024
Delivered 12+ projects spanning AI applications, e-commerce, and SaaS platforms from concept to deployment.
Built scalable Next.js + Django applications with MongoDB and PostgreSQL, significantly improving load times and conversion rates.
Created DeepScreen AI, Proliferate, and KIC Data Privacy tools, driving innovation and adoption across multiple client use cases.
SENIOR SOFTWARE ENGINEER - FULL-STACK | CLOUD | AI INTEGRATION at DIJITAL TEAM / VARIGENCESENIOR SOFTWARE ENGINEER - FULL-STACK | CLOUD | AI INTEGRATION at DIJITAL TEAM / VARIGENCE