Gouranga Jha

Software Engineer by Role, Data Scientist by Impact

With 7+ years of foundational experience in the civil construction industry, I successfully transitioned into data science and software development—driven by a passion for AI, machine learning, and intelligent systems.

About Me

Gouranga Jha
Software Engineer | Data Scientist
AI/ML
Machine Learning
Data Analytics
Deep Learning
Generative AI
Agentic AI
LangChain
LangGraph
RAG
MCP
Deployment
Business Analytics
Software Development
Client Engagement

Software Engineer | Data Scientist

Professional Journey

Successfully transitioned from 7+ years in civil construction management to a thriving career in data science, leveraging analytical thinking and project management skills in the tech industry.

Technical Expertise

4+ years of experience in Data Science, Business Analytics, Software Development, and Client Engagement

Core Skills

Expert in Python, SQL, and modern AI frameworks including Open AI Agents SDK, LangChain, LangGraph, RAG, and MCP, with proven experience in end-to-end ML pipeline development and production deployment.

Vision & Passion

Committed to leveraging data science and AI to solve complex business problems, with a focus on delivering actionable insights and intelligent automation solutions that drive measurable business value.

Projects

iot_support_chatbot
IoT Support Chatbot

A Streamlit-based IoT support assistant with RAG over product PDFs, FAISS vector store, and MySQL session storage. Enforces selected language (English/Malay), collects feedback, and provides expert contact details. Includes Docker and CI configs.

Source Code
agentic_ai_system
Agentic AI System : LLMOPS Focused

A production-ready agentic AI with smart single-tool selection (Tavily or Wikipedia) powered by Groq Llama 3. Integrates FastAPI backend and Streamlit UI with CI/CD (Jenkins), Docker, and AWS deployment.

Source Code
e_commerce_ai_recommender
E-Commerce AI Recommendation System

Production-ready AI recommender with a modern chat UI built on Flask. Uses RAG with AstraDB vector search and Groq LLM to generate contextual product recommendations. Includes Docker/Kubernetes deployment and Prometheus/Grafana monitoring.

Source Code
agentic_data_cleaning_pipeline
Agentic Data Cleaning Pipeline (LLM-powered, Orchestrator-Worker)

A modular, agentic data cleaning pipeline using the orchestrator-worker pattern, powered by LLMs (ChatGroq) and managed via LangChain and Streamlit. Upload your CSV or use the Titanic sample, and the app will profile, plan, and execute data cleaning steps with LLMs, showing a cleaning plan and before/after results. Features modular agents, dynamic planning, and a simple Streamlit UI.

Source Code
bedtime_story_generator
Bedtime Story Generator

A web application that generates personalized bedtime stories for children based on their preferences. Built with Python, Streamlit, and LangChain, this app provides a fun, interactive way to create magical stories for kids.

Source Code
rag_based_chatbot_with_document_upload_and_chat_history
RAG based Chatbot with Document Upload and Chat History

A modular, conversational RAG (Retrieval-Augmented Generation) system built with Streamlit that allows users to upload PDF documents and have interactive conversations about their content.

Source Code
coding_assistant
Coding Assistant

A coding assistant powered by LangChain and Groq, with a modern Gradio UI. This assistant helps you with coding questions, code generation, and detailed explanations.

Source Code
movie_review_generator_before_vs_after_fine_tuning
Movie Review Generator: Before vs After Fine-Tuning

This project demonstrates the effect of fine-tuning a small language model (DistilGPT-2) on a tiny movie review dataset using Hugging Face Transformers and Datasets. The app provides a Streamlit UI to compare the model's output before and after fine-tuning.

Source Code
career_compass_ai
Career Compass AI

An AI-powered job search application that helps you find your dream opportunities using SerpAPI and Groq LLM, with a beautiful, modern Streamlit interface.

Source Code
ai_news_reporter
AI News Reporter

A conversational Python application that uses AI, real-time APIs, and the Model Context Protocol (MCP) to provide current weather information and the latest news headlines for any topic, with a modern Streamlit UI.

Source Code
sql_ai_explorer
SQL AI Explorer

A Gradio-powered web app that lets you interact with your MySQL database using natural language queries, powered by LangChain and Groq LLM. Visualize your data instantly with tables.

Source Code
youtube_to_content_creator
YouTube to Content Creator

Transform YouTube videos into Instagram or Medium content using AI agents! This app leverages CrewAI, Streamlit, and quality APIs to generate high-quality, platform-optimized content from any YouTube video link.

Source Code
academic_achievement_metric_estimator
Academic Achievement Metric Estimator

The project predicts student score using a web interface built with Python, Flask, and machine learning, deployed on AWS Elastic Beanstalk with CD via AWS CodePipeline using modular coding approach.

Source Code
text_summarization
Text Summarization

The project uses FastAPI, Transformers, and Hugging Face models to create a text summarization system with pipelines for data handling, model training, and evaluation, deployed on AWS with CI/CD via GitHub Actions, Docker using modular coding approach.

Source Code
wine_quality_estimator
Wine Quality Estimator

The project features an end-to-end machine learning pipeline with MLflow for experiment tracking, covering data handling, model training, and evaluation, plus a Flask app for wine quality prediction, deployed on AWS with CI/CD via GitHub Actions, Docker using modular coding approach.

Source Code
mybay_shopping_app_product_recommendation_system
MyBay Shopping App - Product Recommendation System

The project implements a content-based recommendation system for the MyBay shopping app using Flask, SQLAlchemy, and machine learning, with user authentication and product exploration features, added other experimental recommendation systems.

Source Code
o2tel_customer_churn_analysis
O2Tel Customer Churn Analysis

The project analyzes telecom customer churn using SQL, Power BI, and machine learning, featuring data preprocessing, EDA, predictive analysis and re-visualisation, and an interactive Power BI dashboard to gather actionable insights.

Source Code
twiggy_instamart_sales_analysis
Twiggy Instamart Sales Analysis

The project conducts a comprehensive sales performance analysis for Twiggy Instamart using Power BI to gain insights into KPIs and identify areas for improvement via actionable insights.

Source Code
global_mart_retail_orders_e_commerce_analysis
Global Mart: Retail Orders E-commerce Analysis

The project offers an end-to-end analysis of a global e-commerce dataset (2022-2023) using Python and SQL, providing actionable insights for optimizing business strategies.

Source Code
ott_content_analysis
OTT Content Analysis

The project analyzes given OTT's content catalog to identify trends, popular genres, and opportunities for future content development through actionable insights.

Source Code
targets_operations_analysis
Target's Operations Analysis

The project uses the Target Brazil's operational information to generate insights for enhancing Target's strategic decision-making, covering customer orders, payments, shipping, product attributes, and demographics.

Source Code
pypi_publishing_synthetic_data_generator
PyPI Publishing: Synthetic Data Generator

The package generates synthetic datasets with continuous, categorical, time-series features and noise, ideal for data scientists and engineers needing data for testing, modeling, or privacy concerns.

Source Code
starducks_coffee_sales_dashboard_using_ms_excel
Starducks Coffee Sales Dashboard Using MS-Excel

The project analyzes monthly sales trends, product category performance, and store performance for Starducks Coffee to identify improvement areas and growth opportunities via gathered actionable insights.

Source Code
ecommerce_web_scraping
Ecommerce Web Scraping

The project involves two web scraping tasks to extract product and review data from an eCommerce platform, performed in separate Jupyter notebooks and saved to CSV files for analysis.

Source Code
statistics_in_action
Statistics-In-Action

The project aims to solve real-world business problems using ML and applied statistics, providing domain experience with industry datasets and developing data-driven insights for decision-making.

Source Code
supervised_learning_use_cases
Supervised Learning - Use cases

The project aims to build predictive machine learning models using supervised learning techniques on industry datasets from healthcare and banking.

Source Code
prediction_via_ensemble_methods
Prediction via Ensemble Methods

The telecom company aims to predict customer churn using historical data to design effective retention strategies. The prediction to be implemented using ensemble methods

Source Code
unsupervised_learning_use_cases
Unsupervised Learning - Use Cases

The project focuses on solving industry problems using techniques, emphasizing unsupervised learning, synthetic data, clustering, dimensionality reduction, and practical implementations.

Source Code
manufacturing_yield_prediction_utilising_feature_engineering
Manufacturing Yield Prediction: Utilising Feature Engineering

The project is about developing a classifier to predict production outcomes in semiconductor manufacturing by identifying the most relevant signals from noisy data.

Source Code
neural_network_approach_to_classification_and_regression
Neural Network Approach to Classification and Regression

The project consists of solving two distinct industry-related challenges using basic neural networks which would be regression and classification.

Source Code
nlp_text_classification_and_semi_rule_based_chatbot
NLP Text Classification & Semi-Rule Based Chatbot

The project is to develop a multi-label text classifier to predict blogger attributes and an interactive chatbot for customer support automation.

Source Code
nlp_sequential_models
NLP Sequential Models

The project is to develop two sequential NLP models: one for sentiment analysis of IMDB reviews and another for sarcasm detection in news headlines using a bidirectional LSTM network.

Source Code
image_classification_using_ml_and_neural_networks
Image Classification Using ML and Neural Networks

The project is to develop an image classifier to accurately identify plant species from photographs for botanical research.

Source Code
face_detection_and_recognition
Face Detection and Recognition

The project is to develop face detection and recognition models to automate cast and crew information display for a movie streaming application.

Source Code
face_detection_live
Face Detection Live

The project is to develop a real-time face detection system using a webcam to capture video, detect faces with a pre-trained model, and display results with bounding boxes.

Source Code
anomaly_detection_manufacturing
Anomaly Detection: Manufacturing

The project is to employ unsupervised learning models for anomaly detection on a custom dataset, complemented by both visual and numerical analyses.

Source Code
Projects Rest
For more projects

Check out my GitHub account. Link below

GitHub Link

Skills

Programming Languages
Programming Languages

Python (Via Jupyter Notebook,Google Colab, PyCharm, VS Code, Cursor) and SQL/MS SQL (Via MySQL Workbench,SSMS) for efficient data handling, analysis, modelling and database querying.

Data Manipulation and Analysis
Data Manipulation and Analysis

Expert proficiency in Pandas, NumPy, SciPy, scikit-learn, MS-Excel, and SQL for comprehensive data transformation, statistical analysis, and deriving actionable insights from complex datasets. Experience with data profiling, feature engineering, and exploratory data analysis.

Data Visualization
Data Visualization

Expertise in Python visualization libraries (Matplotlib, Seaborn, Plotly) and BI tool (Power BI) for creating interactive dashboards, compelling data stories, and executive-level presentations that drive strategic decision-making.

Storytelling with Data
Storytelling with Data

Combining data visualization with narrative techniques helps communicate insights clearly and drive decision-making, turning numbers into impactful stories.

Machine Learning & Statistical Modeling
Machine Learning & Statistical Modeling

Comprehensive expertise in supervised/unsupervised learning, ensemble methods, statistical modeling, and advanced ML frameworks. Specialized in building production-ready models for classification, regression, clustering, and recommendation systems with focus on model interpretability and business impact.

Data Wrangling & Preprocessing
Data Wrangling & Preprocessing

Strong data cleaning and preprocessing skills ensure high-quality data for building effective models.

Deep Learning
Deep Learning

Experience in neural networks, transformers, and frameworks (TensorFlow, PyTorch, Keras, Hugging Face) for computer vision, NLP, and time-series analysis. Experience with transfer learning, fine-tuning, and deploying deep learning models in production environments.

Model Deployment
Model Deployment

Skills in Flask, FastAPI, Docker, Kubernetes, AWS, GCP, Hugging Face Spaces, Streamlit Cloud, Render, Jenkins, Circle CI, Prometheus, Grafana, SonarQube, Postman, Insomnia

applied_ai_cv_nlp_and_time_series
Applied AI: CV, NLP, and Time Series

Experience in computer vision (OpenCV, YOLO, CNN architectures), natural language processing (BERT, GPT, transformers), and time-series forecasting (ARIMA, LSTM, Prophet) for solving complex real-world problems across diverse industry domains.

generative_and_agentic_ai
Generative & Agentic AI

Proficiency in generative AI (LLM) and agentic AI frameworks (LangChain, LangGraph, OpenAI Agents SDK) for building autonomous, context-aware systems. Specialized in prompt engineering, RAG implementation, multi-agent orchestration, and MCP (Model Context Protocol) for advanced automation and intelligent decision-making.

UI Skills
UI Skills

Proficient in front-end development (HTML, CSS, Bootstrap, JavaScript) and modern UI frameworks for creating responsive, user-friendly interfaces. Experience with Streamlit, Gradio, and web application development for data-driven applications, interactive dashboards, and AI-powered tools.

Structured Thinking, Frameworks, and Modular Coding
Structured Thinking, Frameworks and Modular Approach

Expert ability to break down complex problems using structured frameworks and write modular, reusable code following software engineering best practices. Specialized in designing scalable architectures, implementing design patterns, and ensuring code quality through testing and documentation.

Emotional Intelligence (EQ) and Soft Skills
Emotional Intelligence (EQ) and Soft Skills

High emotional intelligence to navigate teamwork, manage stress, and communicate effectively, vital for collaborating with cross-functional teams, influencing stakeholders and presenting data-driven insights clearly and empathetically.

Prompt Engineering
Prompt Engineering

Advanced expertise in crafting precise, context-aware prompts for large language models ensuring high-quality responses and optimal model performance. Specialized in prompt optimization, few-shot learning, chain-of-thought prompting, and developing robust prompt templates for production applications.

Project Management and Other Tools
Project Management and Other Tools

Expert proficiency in project management tools (Jira, Confluence) and collaboration platforms (MS Teams, Miro) for agile development and team coordination. Advanced skills in productivity software (MS-Office), design tools (Canva), and cross-platform development across Ubuntu and Windows environments.

Code Repository
Code Repository

Expert proficiency in Git version control, GitHub/GitLab workflows, and maintaining clean, well-documented code repositories. Specialized in collaborative development, CI/CD pipelines, code review processes, and implementing best practices for scalable, maintainable codebases across multiple projects and teams.

Contact Details

Email ID: post.gourang@gmail.com

Contact No.: +91-8971709672