Available for opportunities

Sandadi Vithin Reddy

Dallas, TX

Building intelligent systems at the intersection of large language models, retrieval-augmented generation, and production-grade ML infrastructure.

About Me

Building the Future with AI

I'm Sandadi Vithin Reddy, an AI/ML Engineer based in Dallas, TX, passionate about transforming raw research into production-ready intelligent systems.

At Accenture, I architected and deployed LLM-powered applications that reduced manual processing time by over 60%, integrating retrieval-augmented generation pipelines for enterprise knowledge management.

My focus areas include LLM fine-tuning, vector database design, agentic workflows, and ML system architecture. I thrive in environments where research meets scale.

2+
Years Experience
10+
AI/ML Projects
5+
LLM Deployments
99%
Uptime Achieved

LLM Expertise

Deep hands-on experience with GPT-4, Claude, LLaMA, and fine-tuning workflows.

RAG Systems

End-to-end RAG pipelines with vector stores, hybrid search, and re-ranking.

Production ML

Shipping models to production with MLflow, FastAPI, Docker, and cloud platforms.

Collaboration

Cross-functional work with data engineers, product teams, and enterprise clients.

Tech Stack

Skills & Expertise

AI / Machine Learning

PyTorch / TensorFlow90%
Transformers / HuggingFace92%
LLM Fine-tuning (LoRA/QLoRA)85%
LangChain / LlamaIndex90%
Scikit-learn88%

RAG & Vector Systems

Pinecone / Weaviate88%
FAISS / ChromaDB85%
Semantic Search87%
Embeddings (OpenAI, BGE)90%
Hybrid Retrieval & Re-ranking83%

Engineering & Infra

Python95%
FastAPI / Flask88%
Docker / Kubernetes80%
AWS / Azure82%
MLflow / Weights & Biases84%

Data & Databases

SQL / PostgreSQL86%
Spark / Pandas84%
MongoDB80%
Data Pipelines (Airflow)78%
Tableau / Power BI75%

Also proficient in

PythonPyTorchTensorFlowTransformersLangChainLlamaIndexOpenAI APIFastAPIDockerAWSPineconeFAISSMLflowGitLinuxSQLKubernetesAzureSparkRedis
Career

Work Experience

AccentureFull-time

AI/ML Engineer

Jul 2022 – Present
Dallas, TX
  • Architected and deployed an enterprise RAG pipeline for internal knowledge management, reducing document retrieval time by 65% and increasing answer accuracy to 91%.
  • Fine-tuned LLaMA-2 and Mistral models using QLoRA on domain-specific datasets, achieving a 40% improvement over baseline on client benchmarks.
  • Built multi-agent orchestration workflows using LangChain and AutoGen for automated report generation, saving 200+ analyst hours per month.
  • Designed a real-time ML inference API using FastAPI + Docker that handles 5k+ requests/min with 99.9% uptime on AWS ECS.
  • Led end-to-end MLOps modernization, implementing MLflow experiment tracking and CI/CD pipelines that cut model deployment time from 3 days to 4 hours.
  • Mentored junior engineers and delivered internal LLM workshops to 50+ consultants, accelerating AI adoption across practice areas.
PythonLangChainPyTorchFastAPIAWSMLflowPineconeDocker
Academic ResearchResearch

ML Research Assistant

Jan 2021 – Jun 2022
Remote
  • Researched transformer architectures for NLP tasks including text classification, summarization, and question answering.
  • Implemented and evaluated BERT, RoBERTa, and T5 variants on benchmark datasets, publishing findings to department repository.
  • Built data collection and preprocessing pipelines for large-scale text corpora using Python and Spark.
  • Collaborated with professors on grant proposal for NSF-funded NLP research initiative.
PythonHuggingFacePyTorchSparkNLTKscikit-learn
Work

Featured Projects

★ Featured

Enterprise RAG System

Production-Grade Retrieval-Augmented Generation

A full-stack RAG pipeline designed for enterprise knowledge bases. Ingests PDFs, Word docs, and web pages; chunks and embeds with custom strategies; retrieves via hybrid search (BM25 + dense vectors) and re-ranks with cross-encoders before generating grounded answers.

  • Hybrid retrieval: BM25 sparse + dense vector search with late fusion
  • Context-aware chunking with metadata-aware overlap strategy
  • LLM re-ranking pipeline using cross-encoder models for precision boost
  • Streaming response API with source citation and confidence scores
  • Evaluation harness using RAGAS (faithfulness, relevancy, correctness)
  • 91% answer accuracy on internal enterprise benchmark dataset
PythonLangChainPineconeOpenAIFastAPIFAISSRAGASDocker
91%
Accuracy
<800ms
Latency
50K+
Docs Indexed

LLM Fine-Tuning Pipeline

QLoRA Fine-tuning Framework

Modular fine-tuning framework for instruction-following LLMs using QLoRA on consumer and cloud GPUs. Includes data prep, training, evaluation, and GGUF export for local deployment.

40%
Perf Gain
12GB
VRAM
5+
Models
PythonHuggingFacePEFTbitsandbytesPyTorch

AI Document Intelligence

Intelligent Document Processing

Automated document processing pipeline that extracts structured data from unstructured PDFs using vision models and NLP. Handles invoices, contracts, and research papers with high accuracy.

94%
Accuracy
3s/doc
Speed
10K
Docs/day
PythonPyMuPDFspaCyGPT-4VCelery

Multi-Agent Chatbot

Agentic AI Assistant

Conversational multi-agent system with tool use, memory, and routing. Uses AutoGen to orchestrate specialized agents for research, analysis, and action planning with persistent conversation history.

87%
Task Success
6
Agents
12+
Tools
PythonAutoGenLangChainOpenAIChromaDB
Contact

Let's Connect

Open to AI/ML engineering roles, research collaborations, and consulting opportunities.

Get in touch

Email

vithinreddy0@gmail.com

Location

Dallas, Texas, USA

Response Time

Within 24 hours

Find me on

Available for work

Currently open to full-time AI/ML engineering roles, research positions, and select consulting projects.

Send a message