Pune, India
I’m Anuj Daphale, an aspiring Data Analyst and Data Scientist with a strong interest in uncovering patterns, predicting outcomes, and supporting data-driven decisions. I enjoy working with datasets, applying statistical concepts, and understanding machine learning fundamentals. I’m continuously developing my skills in Python, data preprocessing, and visualization to grow into a confident data professional.
May 2025 - July 2025
Diabetic Readmission Analytics & Prediction System
Diabetes Readmission Analytics is an end-to-end data science project covering EDA, data cleaning, feature engineering, and predictive modeling, with insights visualized through Power BI dashboards and an interactive Streamlit web application. The project analyzes imbalanced healthcare data, builds ML models, applies threshold-based prediction tuning, and delivers both business intelligence insights and real-time readmission predictions.
Python | Google BigQuery | Power BI | Streamlit
Github Live
End-to-End Sales Analysis
Developed an automated sales analytics project using Snowflake SQL and Power BI. Created analytics-ready SQL views with engineered time-based fields to support efficient reporting and visualization. Performed data exploration and validation to ensure data accuracy and consistency. Implemented a SQL-driven reporting layer that reduced manual transformations in Power BI and improved dashboard maintainability.
SQL | Snowflake | Power BI
Github
Car Sales Analytics
Cleaned and engineered features on a large dataset of over 500,000 records using Python, ensuring high data quality and analytical readiness. Stored the processed data in SQL databases for efficient querying and scalability. Developed a multi-section Power BI dashboard with custom visualizations to effectively highlight key sales trends and performance insights.
Python | SQL | Power BI
Github
Valorant DataScope
This project delivers a complete data analytics solution for Valorant gameplay, spanning data cleaning, feature engineering, and statistical analysis with Python. Processed data is stored and queried efficiently within Snowflake, enabling scalable cloud-based data management. Power BI dashboards present interactive visualizations that reveal insights into player performance.
Python | SQL | Snowflake | Power BI
Github
EduPipeline Automation
Developed an automated course sales pipeline using n8n, integrating Google Drive, Python, and Supabase, which reduced manual data effort by 70%. Implemented scheduled workflows for daily data extraction, transformation, and syncing with Supabase. Created a Power BI dashboard for interactive visualization of sales, trends, and automation insights.
Python | n8n | Power BI
Github
Retail Sales Analysis
This project showcases a dynamic Retail Sales Dashboard built in Microsoft Excel to analyze sales, profit, margins, and return rates across regions, categories, and time. The dashboard provides interactive KPI cards, visual insights, and slicer-based filtering for easy exploration of business performance. Automation using VBA macros enhances usability by enabling efficient data updates and reporting, making the solution suitable for real-world retail analytics.
Excel
Github
VidStamp Assistant
VidStamp Assistant is a RAG-based video teaching assistant designed to help users quickly find accurate answers from video content. The system leverages Whisper for transcription and integrates Ollama and Groq LLMs to achieve over 90% query accuracy. It uses an automated multi-step AI pipeline to efficiently process videos and retrieve relevant segments, significantly reducing manual video search time. By applying cosine similarity for retrieval, the assistant delivers more contextually relevant responses, making video-based learning faster, smarter, and more effective.
Python | Whisper | Ollama | Groq LLM
Github