Summary
Overview
Work History
Education
Skills
Technical Projects
Timeline
Generic

Keshav Saraogi

Mumbai

Summary

Results-driven IT professional with hands-on experience in building enterprise AI solutions and scalable backend architectures. Proficient in developing RAG systems, fine-tuning LLMs, and designing data pipelines using tools like LangChain, Airflow, and FastAPI. Demonstrated success in microservices deployment, CI/CD optimization, and API engineering. Adept at solving complex data and system integration challenges in fast-paced environments.

Overview

3
3
years of professional experience

Work History

Information Technology Intern

Indorama Ventures
Bangkok
04.2025 - Current
  • Engineered a scalable Retrieval-Augmented Generation (RAG) platform, leveraging LangChain, OpenAI Embeddings, and FastAPI, backed by Pinecone VectorDB, to deliver real-time, context-aware LLM outputs across 10GB+ of SAP ERP and Excel data, enhancing semantic search and decision support for enterprise users.
  • Developed robust, production-ready ETL pipelines using Apache Airflow, seamlessly integrating SAP connectors and Excel data sources; optimized data transformation with Pandas and OpenPyXL, achieving a 70% reduction in manual data prep time, and accelerating analytics workflows.
  • Customized OpenAI LLMs with procurement-specific datasets using few-shot learning, advanced prompt engineering, and LangChain prompt templates; implemented role-sensitive response conditioning and dynamic context management, boosting response accuracy and task alignment in internal QA tests by over 60%.
  • Architected a multi-agent reasoning framework using LangChain’s AgentExecutor, combining SQL and Pandas agents with custom insight-generation tools to process hybrid data queries; ensured consistent multi-turn interactions while mitigating hallucinations through context-aware agent orchestration.

Software Development Intern

Patton Labs
Jacksonville
01.2023 - 04.2023
  • Contributed to the design and deployment of a scalable microservices ecosystem for a cross-functional team of five, leveraging Docker, Kubernetes, Linux, and TypeScript to ensure modularity, fault tolerance, and ease of maintenance.
  • Streamlined the CI/CD pipeline by integrating automated testing, build, and deployment workflows with Docker and Kubernetes, reducing build and release cycle times by 20%, and accelerating feature delivery.
  • Enhanced microservice communication efficiency by architecting RESTful APIs and implementing asynchronous messaging with RabbitMQ, resulting in a 17% increase in inter-service data throughput, and reduced latency.
  • Boosted database performance by 22% through schema optimization and the development of high-efficiency SQL and NoSQL queries, eliminating key backend bottlenecks, and improving overall system responsiveness.

Education

Master of Science - Computer Science

Boston University
Boston, Massachusetts
12-2024

Bachelor of Science - Computer Science

Temple University
Philadelphia, Pennsylvania
12-2022

Skills

  • Docker, Kubernetes, AWS, and Azure Data Lake Storage
  • LangChain, GitHub, Linux, Apache Spark
  • Python, JavaScript, TypeScript, R, Java
  • HTML, CSS, Tableau, MySQL, PostgreSQL
  • MongoDB, SQL, NoSQL, REST APIs, and Cloud Services
  • TensorFlow, Keras, PyTorch, NumPy, Pandas
  • Matplotlib, PyTest, CI/CD, GitHub Actions, Bash
  • LLMs, prompt engineering, RAG systems, OpenAI APIs

Technical Projects

Artist classification and recognition using big data and deep learning

  • Designed a scalable audio ingestion and preprocessing pipeline to handle 10GB+ of raw audio data using the Hadoop Distributed File System (HDFS) and Apache Spark on AWS EMR, optimizing for the efficient preparation of high-volume datasets for ML tasks
  • Developed modular ETL workflows with PySpark and Spark MLlib to extract audio features (MFCCs, Chroma, Spectral Contrast) from WAV/MP3 files, transforming them into time-frequency spectrogram matrices, resulting in a 30% reduction in data preparation latency
  • Automated the end-to-end data processing lifecycle using AWS Step Functions, integrating raw data retrieval, feature extraction, and S3-based storage with prefix-based partitioning and versioning to enhance auditability and scalability

Object detection pipeline on AWS

  • Engineered a complete image data pipeline for object detection using AWS S3, SageMaker Processing Jobs, and OpenCV, supporting real-time ingestion and preprocessing of large-scale annotated datasets
  • Implemented robust data augmentation and annotation standardization workflows using Pandas and Boto3, reducing preprocessing time by 40 percent and ensuring consistency across diverse metadata formats
  • Built ETL components for bounding box extraction, format conversion (Pascal VOC ↔ YOLO ↔ COCO), and dataset partitioning (train/validation/test) using Python and Dockerized SageMaker scripts for deployment at scale

Timeline

Information Technology Intern

Indorama Ventures
04.2025 - Current

Software Development Intern

Patton Labs
01.2023 - 04.2023

Master of Science - Computer Science

Boston University

Bachelor of Science - Computer Science

Temple University
Keshav Saraogi