Srikar Reddy Nelavetla

I'm a Data Scientist Data Engineer Machine Learning Engineer Data Analyst

Data Scientist & AI Enthusiast with 3+ years of experience developing and deploying scalable ML systems in production environments. Proficient in Python, TensorFlow, PyTorch, and AWS, with a strong foundation in data engineering using Spark, SQL, and Airflow. Experienced in designing deep learning architectures, LLMs optimizing model performance, and integrating MLOps workflows for continuous delivery. Passionate about building robust, efficient AI solutions that bridge the gap between research and real-world applications.

My Services

Data Engineering & Cloud Infrastructure:

Develops scalable ETL pipelines using Apache Spark, SQL, and AWS Glue. Manages distributed data systems on AWS (S3, Redshift, Athena), orchestrated with Airflow and deployed via Docker and Kubernetes for performance and scalability.

Machine Learning Systems:

Trains and deploys supervised and unsupervised models using TensorFlow, PyTorch, and Scikit-learn. Applies statistical modeling and time-series forecasting for mission-critical tasks such as churn prediction, anomaly detection, and demand estimation.

LLM & Generative AI Workflows:

Builds modular NLP and AI applications using LangChain, LangGraph, and OpenAI APIs. Fine-tunes transformer-based models (e.g., BERT, RoBERTa, XLM-R) and integrate vector stores, embeddings, and prompt engineering for retrieval-augmented generation (RAG) and semantic reasoning tasks.

MLOps & CI/CD:

Automates ML lifecycles using Git, Docker, Kubernetes, and CI/CD pipelines. Deploys models on AWS Lambda and SageMaker with continuous monitoring and versioning for reproducibility, resilience, and low-latency inference.

Analytics & Visualization:

Extracts and communicates insights using Python (Pandas, NumPy, Matplotlib), R, and BI tools like Tableau, Streamlit, and Looker Studio. Enables stakeholder-driven decisions through intuitive, real-time dashboards.

System Design & Integration:

Connects backend APIs, cloud-native infrastructure, and AI components into unified solutions. Focuses on modularity, maintainability, and end-to-end performance across batch and real-time environments.

Why Hire Me?

Results-driven Data and AI Engineer with 3+ years of experience delivering end-to-end solutions across data pipelines, machine learning models, and production AI systems. Proficient in Python, Spark, AWS, and Kubernetes, with expertise in deploying scalable workflows and automating large-scale analytics. Experienced in building intelligent LLM systems using LangChain, LangGraph, and OpenAI. Combines deep technical skills with a focus on business impact to deliver efficient, reliable, and insight-driven solutions. 🚀

My Experience

Experienced in developing scalable data and AI solutions with a strong foundation in software engineering. Skilled in building data pipelines, deploying machine learning models, and leveraging cloud infrastructure and automation to enhance system performance and efficiency.

June 2024 - Present

Machine Learning Engineer

Netflix

I engineered scalable machine learning infrastructure with a focus on real-time personalization and efficient model deployment. This included building a serverless feature store on AWS (Lambda, DynamoDB, S3) to centralize user embeddings, cutting feature duplication by 35%, and enabling EventBridge-driven updates. I optimized large-scale ML pipelines using PySpark and Ray, reducing training runtime by 45% on 1TB+ datasets. My work in natural language processing involved BERT and spaCy for multilingual content, improving title relevance by 15%. I also implemented LLM-powered content enrichment using GPT-4 and Claude, driving an 18% increase in CTR for long-tail content. To enhance experimentation and scalability, I built a multi-armed bandit framework for A/B testing and deployed containerized models with Docker, Kubernetes, and CI/CD via Jenkins. Additionally, I developed real-time feature pipelines with Apache Flink and Kafka, delivering sub-500ms latency personalization for 10M+ users.

June 2021 -July 2023

Data Scientist

IBM

I designed and implemented ETL pipelines using Python, SQL, and Apache Spark, improving data processing efficiency by 25% and accelerating model training across enterprise-scale ML workflows. I built and deployed scalable machine learning models on IBM Cloud using Watson Studio, Watson Machine Learning, and Cloud Functions, enabling real-time inference for high-demand applications. By automating the ML lifecycle with Apache Airflow, I reduced manual intervention by 60% through custom DAGs for ingestion, feature engineering, and retraining. Additionally, I streamlined CI/CD for ML deployment with Git, Docker, Kubernetes, and Jenkins, cutting deployment time by 40% and boosting experimentation velocity. My work also included developing supervised (XGBoost, LightGBM) and unsupervised models (K-Means, DBSCAN) for use cases like fraud detection, customer segmentation, and predictive maintenance—achieving a 30% improvement in anomaly detection accuracy and a 22% drop in false positives. I led the end-to-end SDLC for ML systems, integrating big data tools like Hadoop, Apache Spark, and IBM Cloud Pak to deliver robust solutions in production.

March 2024 -May 2025

Course Facilitator

University of Colorado, Boulder

Facilitated three advanced courses in Statistical Methods and Applications, mentoring over 200 students in key areas such as exploratory data analysis, probability theory, statistical modeling, and ethical considerations in data science. Emphasized the development of reproducible statistical workflows and the application of data science techniques to real-world domains like business and climate science. Instruction focused on hypothesis testing, p-value interpretation, and the critical evaluation of statistical methods. Additionally, promoted effective collaboration through self-reflection, peer feedback, and video analysis, while training students to communicate technical results to non-technical audiences and uphold ethical standards in professional practice.

June 2020 - March 2021

Machine Learning Research Assistant

Aurora’s Degree College

As a Machine Learning Research Assistant at Aurora’s Degree College, I developed a deep learning model using a two-stage convolutional neural network (CNN) in PyTorch to classify melanoma tissue images, achieving 85% accuracy across six mutation types. I built an image processing pipeline to handle over 25,000 images and trained models on Google Cloud, improving AUC by 5% through test-time augmentation. I analyzed misclassifications using color-coded tessellations to better understand prediction errors. Based on these findings, I extended the model’s capability to classify human melanoma tissue, enhancing its relevance for real-world medical imaging applications.

My Education

Currently pursuing Masters Degree in Data Science at the University of Colorado Boulder (GPA: 3.94), specializing in Machine Learning, Artificial Intelligence and Big data. Previously, I earned a Bachelor of Science with double majors in Mathematics and Statistics & minor in Computer Science from Osmania University (GPA: 3.9), focusing on data structures, algorithms, DBMS , Core Statistics. This academic foundation has strengthened my technical expertise and problem-solving skills for building Data driven solutions.

August 2023 - May 2025

Master in Data Science

University of Colorado Boulder 3.94/4

Machine Learning, Deep Learning,NLP, Statistical Methods & Applications, Data Center Scale Computing

June 2019-July 2022

Bachelor Of Science

Osmania University 3.94/4

Data Structures and Algorithms, DBMS, Linear Algebra, Statistics

My Skills

Specializing in backend development, cloud computing, and automation, I build scalable, high-performance data and AI applications. With a strong foundation in AI, machine learning, deep learning, and MLOps, I deploy end-to-end solutions using tools like AWS, Kubernetes, and modern LLM frameworks.

Python
R Logo R programming
SQL
C++ Logo C++
Apache Spark Logo Spark
Apache Airflow Logo Airflow
Hadoop Logo Hadoop
Apache Flink Logo Apache Flink
TensorFlow Logo Tensorflow
Keras Logo Keras
Huggingface Logo Huggingface
PyTorch Logo Pytorch
AWS Logo AWS
Power BI Logo Power Bi
Tableau Logo Tableau
VS Code
MongoDB
Jupyter Logo Jupyter Notebook
Kubernetes Logo Kubernetes
CI/CD
Git Logo Git
Docker Logo Docker
Jenkins Logo Jenkins
Terraform Logo Terraform

About Me

I combine expertise in data engineering and machine learning with strong programming and infrastructure skills to deliver end-to-end AI solutions that are efficient, reliable, and impactful.

Name Srikar Reddy Nelavetla

Gender Male

Age 25 Years Old

Maital Status Single

Address Woodinville , WA, 98072

Nationality Indian

Experience 3+ Years

Full Time Available

Freelance Available

Phone (+1) 720 234 7493

Email srikarreddyy3@gmail.com

Languages English, Hindi, Telugu

Latest Project

01

NewYork City Taxi Demand Predictor

Developed an end-to-end pipeline on AWS to process 9.3 million taxi trip records from a three-month period in 2023, integrating weather data to enhance predictive modeling. Trained machine learning models including XGBoost, CatBoost, LightGBM, and AdaBoost, achieving a 15% improvement in accuracy with CatBoost (MAE: 1.5315). Deployed the model using AWS Lambda and API Gateway for real-time predictions. Performance was monitored with AWS CloudWatch, uncovering key demand patterns such as a 20% increase in usage during rush hours and heightened demand in extreme weather conditions, enabling data-driven fleet optimization.

Webscraping, Sklearn, AWS

02

AI-Powered Multilingual Detection with Neural-Networks

Developed and optimized neural network architectures, including CNN, BiLSTM, and GRU, for multilingual text classification. Achieved 93.91% accuracy and a 0.94 F1-score using a CNN+BiLSTM hybrid model, while reducing training time by 66%, completing training in just 10 epochs. Additionally, applied Principal Component Analysis (PCA) to compress 768-dimensional XLM-RoBERTa embeddings down to 350 dimensions, preserving 98.6% of the original variance. This dimensionality reduction enabled efficient processing of 1.8 million text samples across 20 different languages.

Tensorflow, Huggingface,Sk Learn

03

Cloud-Native YouTube Data Insights

Architected a serverless YouTube trend analysis pipeline using AWS services including S3, Glue, Lambda, Athena, and QuickSight to process over 100,000 daily records and analyze more than 1TB of data for real-time insights. Streamlined ETL workflows by integrating AWS Glue and Lambda, which accelerated data transformation by 70%. Delivered dynamic QuickSight dashboards to visualize trends across 10,000+ videos, enabling actionable insights and efficient data exploration.

Python, Pyspark, AWS

04

The LangChain Chronicles

Developed a Retrieval-Augmented Generation (RAG) system using LangChain to extract and answer questions from large-scale datasets, including 2GB of PDFs focused on data science, statistics, and machine learning. Implemented a Conversational Retrieval Chain architecture utilizing FAISS as the vector store, Llama-3 through Ollama for language modeling, and Instruct-XL for embedding generation. Engineered efficient document chunking and preprocessing pipelines using RecursiveCharacterTextSplitter to optimize document retrieval and enhance the accuracy and relevance of query responses.

LangChain, Ollama, Python

05

Songs lyric generator using NLP

Developed an NLP-based song lyric generation model by evaluating and comparing Naive Bayes N-Gram, RNN-LSTM, and GPT-2 Transformer architectures. Trained the models on a dataset of 762 songs, achieving 96.16% accuracy with the RNN-LSTM model over 20 epochs, while the Naive Bayes approach produced an average perplexity score of 200. Leveraged GPT-2 with its 1.5 billion parameters to generate coherent and stylistically consistent lyrics with rhyme schemes, showcasing the model’s potential for creative applications in the music industry.

LSTM , Transformers

06

PCOS Detection Using Deep Learning

Developed a deep learning-based diagnostic system to detect Polycystic Ovary Syndrome (PCOS) from pelvic ultrasound images using DenseNet121, Vision Transformer, and a custom lightweight CNN (IustNet). Implemented a complete pipeline covering data preprocessing, image augmentation, model training, and evaluation using classification metrics like accuracy, F1-score, and AUC. IustNet demonstrated superior generalization and efficiency, making it ideal for real-time clinical use. Deployed the model through a Gradio web interface and Hugging Face Spaces for accessible browser-based predictions, delivering a reproducible, deployable solution aimed at improving PCOS diagnosis in under-resourced settings.

Deep Learning, GANs , Gradio

Let's Connect

Great things happen when ideas meet execution. If you're looking to bring a data-driven solution to life, I’d love to be part of it.

Phone

(+1) 720 234 7493

Email

srikarreddyy3@gmail.com

City

Seattle, WA

Contact Me!