Meghana Thota

Data Scientist & AI/ML Engineer

Transforming complex data into intelligent solutions that drive innovation and create measurable business impact across industries.

2+
Years Experience
Production AI Systems
12+
ML Projects
End-to-End Solutions
6+
Certifications
AWS & Data Science

About Me

Results-driven Data Scientist with a proven track record of delivering enterprise-grade AI solutions that drive measurable business impact and operational efficiency.

Background

Master's in Data Science from UMass Dartmouth with 2+ years of hands-on experience architecting and deploying production-ready AI systems. Demonstrated expertise in reducing operational costs by 40% through intelligent automation and improving system accuracy by 28% through advanced ML techniques.

Led cross-functional teams in developing CNN and CGAN-based diagnostic systems for healthcare applications, achieving 94% accuracy in medical image classification. Architected battery optimization systems for electric vehicles that improved fault detection by 33% and extended battery life predictions.

Specialized in Large Language Model integration, RAG architectures, and prompt engineering for enterprise applications. Successfully deployed systems processing millions of daily transactions with sub-second response times and 99.9% uptime.

Core Competencies

Machine Learning Engineering

Production-scale ML systems with 99.9% uptime, serving millions of predictions daily across healthcare and automotive sectors.

Cloud Architecture

AWS-certified solutions architect with expertise in scalable, cost-optimized infrastructure reducing operational costs by 40%.

Data Engineering

Real-time data pipelines processing 10M+ records daily with sub-second latency and automated quality monitoring.

AI Research & Development

Published researcher in computer vision and generative AI, with models achieving state-of-the-art performance metrics.

Technology Stack

Python
PyTorch
TensorFlow
AWS
Docker
Kubernetes
Apache Airflow
PostgreSQL
LangChain
OpenAI
Streamlit
FastAPI
React
TypeScript
Git
MLflow

Professional Experience

Building impactful AI solutions across healthcare and automotive industries

Graduate Research Assistant - Data Scientist
University of Massachusetts Dartmouth
Boston, USA
Sep 2024 – May 2025
Research

Key Achievements

  • Designed CNN and CGAN-based pipelines improving breast cancer detection accuracy by 5%
  • Achieved 40% faster training using GPU-accelerated AWS EC2 clusters with DDP
  • Enabled real-time inference (3x faster) using TorchScript and model pruning
  • Reduced deployment time by 60% using Docker and AWS Lambda
  • Integrated LLM-powered diagnostic explanations for physician-facing outputs

Technologies Used

PyTorch
AWS EC2
Docker
AWS Lambda
Snowflake
CNNs
CGANs
Data Scientist
Leep eDrive
India
May 2022 – July 2023
Full-time

Key Achievements

  • Developed LightGBM models achieving 18% higher accuracy for battery SOC and RUL prediction
  • Built real-time anomaly detection system identifying battery faults 33% earlier
  • Improved fault detection accuracy by 28% using feature importance analysis
  • Reduced testing cycle time by 20% through ML-driven insights collaboration
  • Designed containerized ETL pipelines using Apache Airflow and Docker

Technologies Used

LightGBM
Apache Airflow
Docker
PostgreSQL
Selenium
Python

Featured Projects

Production-grade AI systems delivering measurable business impact across healthcare, automotive, and enterprise sectors.

LLM Agent Workflow Visualizer using Graphiti and Neo4j
Production

April 2025 – May 2025

60% faster content generation

Built a real-time visualization tool for LLM agent decision-making using Graphiti Agent integration, tracking execution flows and toolchain interactions through an interactive Streamlit interface.

Key Achievements

  • Integrated the Graphiti Agent from the Ottomator framework to visualize the decision-making flow of LLM agents in real-time.
  • Used GraphitiTracer to hook into agent lifecycle events like on_agent_action, on_tool_end, and on_chain_end to track execution steps
  • Parsed intermediate reasoning steps, tool usage, and model outputs to dynamically construct a visual workflow graph
  • Leveraged Streamlit to build an interactive front-end for visualizing agent toolchains and trace paths
  • Made the tool extensible and model-agnostic, supporting integration with different LLMs and custom toolchains.

Technologies

Python
OpenAI GPT-4
Graphiti
Neo4j
Streamlit
Docker
Knowledge Graphs
End-to-End Weather Data Ingestion using Apache Airflow and PostgreSQL
Production

March 2025 – April 2025

Pipeline Acceleration

Developed automated ETL pipelines with Apache Airflow and Spark, implementing real-time feature engineering that reduced processing latency by 75% while ensuring robust monitoring and error handling.

Key Achievements

  • Built custom Airflow tasks using the @task decorator and managed dependencies using task chaining within a DAG
  • Implemented real-time feature engineering with Apache Spark, reducing latency by 75%
  • Configured Airflow connections and secrets management for API and database access using Astro Runtime.
  • Managed task scheduling, retries, and logging through Airflow's native UI for robust and transparent pipeline monitoring

Technologies

Python
Apache Airflow
Apache Spark
MLflow
PostgreSQL
Docker
Kubernetes
Customer Purchase Prediction Using Amazon SageMaker
Production

Feb 2025 – Mar 2025

89.7% prediction accuracy

Machine learning model to predict customer purchase behavior using Amazon SageMaker with 89.7% accuracy.

Key Achievements

  • Built XGBoost model achieving 89.7% accuracy in customer purchase classification
  • Implemented end-to-end MLOps pipeline using AWS SageMaker and S3
  • Deployed real-time API for seamless prediction serving
  • Fine-tuned hyperparameters for optimal model performance
  • Enhanced understanding of cloud-based model training and scalable AI deployment

Technologies

Python
AWS SageMaker
XGBoost
Amazon S3
NumPy
Pandas
MLOps
FashionGAN - Generating Fashion with Generative Adversarial Networks
Completed

Jan 2025 – Feb 2025

Realistic fashion image generation

Deep Convolutional GAN (DCGAN) trained on fashion dataset to generate realistic clothing images.

Key Achievements

  • Trained Deep Convolutional GAN (DCGAN) on Fashion-MNIST dataset
  • Generated realistic fashion images using adversarial training
  • Fine-tuned hyperparameters for improved image quality
  • Explored latent space representations of fashion items
  • Demonstrated AI applications in fashion design and synthetic data generation

Technologies

Python
TensorFlow
GANs
DCGAN
NumPy
Pandas
Matplotlib
Deep Learning Platform for Medical Diagnostics
Production

September 2024 – December 2024

86.34% diagnostic accuracy

HIPAA-compliant medical imaging platform with 94% diagnostic accuracy, processing 500+ scans daily.

Key Achievements

  • Developed CNN architecture achieving 94% accuracy in medical image classification
  • Implemented CGAN-based data augmentation, improving model robustness by 20%
  • Built HIPAA-compliant infrastructure with end-to-end encryption and audit trails
  • Deployed containerized solution with automated CI/CD, reducing deployment time by 80%

Technologies

PyTorch
OpenCV
FastAPI
PostgreSQL
Docker
AWS
Terraform
Battery Performance Optimization System
Production

May 2022 – July 2023

33% improvement in fault detection

ML-driven predictive maintenance system for electric vehicles, improving fault detection by 33% and extending battery life.

Key Achievements

  • Developed LightGBM ensemble models with 92% accuracy for battery life prediction
  • Implemented real-time anomaly detection using Isolation Forest, reducing false positives by 40%
  • Built automated data collection system processing 1M+ sensor readings per hour
  • Achieved $2M+ annual savings through predictive maintenance optimization

Technologies

Python
LightGBM
Apache Airflow
PostgreSQL
Grafana
Docker

Technical Skills

Comprehensive technical expertise spanning the entire AI/ML development lifecycle, from research and prototyping to production deployment and monitoring.

Machine Learning & AI
  • Deep Learning (PyTorch, TensorFlow)
  • Computer Vision & NLP
  • Large Language Models
  • MLOps & Model Deployment
  • Generative AI (GANs, VAEs)
  • Reinforcement Learning
Cloud & Infrastructure
  • AWS (SageMaker, EC2, S3, Lambda)
  • Docker & Kubernetes
  • CI/CD Pipelines
  • Infrastructure as Code
  • Microservices Architecture
  • Auto-scaling & Load Balancing
Data Engineering
  • Apache Airflow & Kafka
  • Real-time Data Processing
  • ETL/ELT Pipelines
  • Data Warehousing
  • Stream Processing
  • Data Quality & Governance
Programming & Development
  • Python (Advanced)
  • SQL & NoSQL Databases
  • JavaScript/TypeScript
  • API Development (FastAPI, Flask)
  • Version Control (Git)
  • Software Architecture
Professional Certifications

Industry-recognized credentials demonstrating expertise and commitment to continuous learning

AWS Certified Cloud Practitioner

Amazon Web Services

2024

Machine Learning Specialization

Stanford University

2023

Deep Learning Specialization

DeepLearning.AI

2023

Data Engineering with Apache Airflow

IBM

2024

Advanced SQL for Data Scientists

Coursera

2023

Generative AI with Large Language Models

DeepLearning.AI

2024

Latest Blog Posts

Sharing insights on AI, machine learning, and data science

CVXPY: The Python Library That Makes Optimization Actually Easy

Write complex optimization problems exactly like mathematical formulas

Just now
5 min read
Python
Optimization
CVXPY
The Complete PySpark Guide: From Zero to Processing Terabytes Like a Pro

Master distributed data processing with Python's most powerful big data tool

22h ago
12 min read
PySpark
Big Data
Python
Key Challenges in Applying AI to Molecular Biology

Artificial Intelligence is revolutionizing fields from autonomous vehicles to natural language processing...

6d ago
8 min read
AI
Biology
Research
Taming the Curse: A Complete Guide to Dimensionality Reduction in Machine Learning

Understanding when, why, and how to reduce dimensions in your data science projects

8d ago
10 min read
Machine Learning
Data Science
Dimensionality Reduction
Research Excellence

Battery Intelligence Research

Breakthrough research in machine learning-driven battery optimization that transformed industry standards and earned recognition as outstanding undergraduate research.

Research Impact

The Challenge

Traditional battery management systems were failing across the industry. Faults went undetected until catastrophic failure, and data processing couldn't keep up with real-time demands.

The Innovation

Developed advanced machine learning algorithms using LightGBM and Isolation Forest to predict battery failures before they occurred. Created containerized ETL pipelines with Apache Airflow.

Industry Recognition

Awarded "Outstanding Undergraduate Research Recognition" by the Department of Electrical and Electronics Engineering. Research now influences battery monitoring in commercial applications.

Key Achievements

18% Accuracy Improvement

LightGBM models for battery life prediction

33% Earlier Detection

Real-time anomaly detection system

28% Better Performance

Fault detection through feature analysis

Outstanding Research Award

Recognition for breakthrough work

Research Timeline

May 2022 - Selection

Chosen from 120 candidates for research opportunity

2022-2023 - Development

15-month intensive research and development phase

July 2023 - Recognition

Outstanding Research Award and industry adoption

Get In Touch

Let's discuss opportunities in AI, data science, or potential collaborations

Contact Information

Connect with me

Open to Opportunities

Currently seeking full-time Data Scientist positions and research collaborations in AI/ML, particularly in healthcare and automotive applications.

Send me a message