resume-searchimg

Prince Kumar Jha

India **********

Updated on : 18-Sep-2023

Resume Headline

Geekster | Trainee at AlmaBetter | Python | SQL | Excel | Tableau | Power BI | Google Looker Studio | Machine learning | Data analytics | Data scientist | Algorithms |

Skill Set : Communication, Leadership, Tableau, power bi, Google Analytics, Data Visualization, Team Leadership, Statistics For Data Science, Analytical Skills, Datasets, Python (Programming Language), NumPy, Pandas, Maths For Data Science, Microsoft Excel, Machine Learning, SQL, Data Analysis, MySQL, Time Management, Problem Solving

Prefered Job Type : : Full-Time, Remote

Employement Details

Data science trainee

AlmaBetter
May 2022 - Feb 2023

Apprenticeship 

 Skills: Datasets · Statistics For Data Science · power bi · Tableau · Leadership · Communication · Problem Solving · Time Management · MySQL · Data Analysis · SQL · Machine Learning · Microsoft Excel · Maths For Data Science · Pandas · NumPy · Python (Programming Language) 


Data Science Curriculum Engineer

Geekster
Jun 2023 - Present

· Internship

Salary : 12,000 Monthly
Notice Period : 1 Month

Data Science/Analyst trainee

AlmaBetter
Feb 2023 - Jun 2023

 Skills: Datasets · Analytical Skills · Team Leadership · Data Visualization · Google Analytics · power bi · Tableau · Leadership · Communication · Problem Solving · Time Management · MySQL · Data Analysis · SQL · Microsoft Excel · Maths For Data Science · Pandas · NumPy · Python (Programming Language) 


Education Details

Graduation in Accounting and Finance

Bachelor's degree B.com From Lalit Narayan Mithila University, Darbhanga

Passout Year : 2021

Course Type : Full Time

Percentage/Grade : 8.9 Scale 10 Grading System


Project Details

Cardiovascular-Risk-Prediction

Associated with AlmaBetter
October 2022 - October 2022

 The data set is publicly available on the Kaggle website, and it is from an ongoing cardiovascular study on residents of the town of Framingham, Massachusetts. The classification goal is to predict whether the patient has a 10-year risk of future coronary heart disease (CHD). The data set provides the patients’ information. It includes over 4,000 records and 15 attributes. Each attribute is a potential risk factor. There are both demographic, behavioral, and medical risk factors.

1. The General approach that I employed
Data cleaning and preprocessing
Exploratory Data Analysis
Feature Selection
Model development and comparison
The accuracy score
The F1 Score
The Area under the ROC Curve (AUC)

Observation
a) XGBoost, the SVM gives the highest Accuracy, Recall, Precision, and AUC score.
b) The highest recall is given by the SVM.
c) Highest AUC is given by SVM Overall we can say that the support vector machine was the best-performing model across all metrics. Its best parameters were a radial kernel, a C value of 10, and a gamma value of 1. Its high AUC and F1 score also show that the model has a high true positive rate and is thus sensitive to predict if one has a high risk of developing CHD, i.e., getting a heart attack within 10 years.

2. CHALLENGES
a) Handling the missing values.
b) Making data more accurate.
c) Selection of important features.

3. CONCLUSION
a) The number of people who have Cardiovascular heart disease is almost equal between smokers and non-smokers.
b) The top features in predicting the ten-year risk of developing Cardiovascular Heart Disease are 'age', 'totChol', 'sysBP', 'diaBP', 'BMI', 'heart rate', and 'glucose'.
c) The SVM with the radial kernel is the best-performing model in terms of accuracy and the F1 score.
d) Balancing the dataset by using the SMOTE technique helped in improving the models' sensitivity.
With more data(especially that of the minority class) better models can be built. 


NETFLIX-MOVIES-AND-TV-SHOWS-CLUSTERING

Associated with AlmaBetter
November 2022 - November 2022

 Problem Statement
The goal of this project is to find similarities within groups of people to build a movie recommendation system for users. We are going to analyze a dataset from the Netflix database to explore the characteristics that people share in movies. We have experienced it ourselves or have been in the room, the endless scrolling of selecting what to watch. Users spend more time deciding what to watch than watching their movies.

Steps involved:
1. Data cleaning and pre-processing: Here I checked and dealt with missing and duplicate variables from the data set as these can grossly affect the performance of different machine learning algorithms (many algorithms do not tolerate missing data).
2. Exploratory Data Analysis: Here I wanted to gain important statistical insights from the data and the things that I checked for were the distributions of the different attributes, correlations of the attributes with each other, and the target variable and I calculated important odds and proportions for the categorical attributes.
3. Clustering: Clustering or cluster analysis is a machine learning technique, which groups the unlabelled dataset. It can be defined as "A way of grouping the data points into different clusters, consisting of similar data points. The objects with the possible similarities remain in a group that has less or no similarities with another group." It does it by finding some similar patterns in the unlabelled dataset such as shape, size, colour, behavior, etc., and divides them as per the presence and absence of those similar patterns. It is an unsupervised learning method; hence no supervision is provided to the algorithm, and it deals with the unlabeled dataset. After applying this clustering technique, each cluster or group is provided with a cluster-ID. ML systems can use this id to simplify the processing of large and complex datasets. 


EDA - On- Global Terrorism

Associated with AlmaBetter
June 2022 - Present

 In this EDA project, we were provided with the Global Terrorism Database (GTD) which is a database including information on terrorist events around the world from 1970 through 2017. Unlike many other event databases, the GTD includes systematic data on domestic as well as international terrorist incidents that have occurred during this time period and now includes more than 181,000 cases.In this EDA project, we were provided with the Global Terrorism Database (GTD) which is a database including information on terrorist events around the world from 1970 through 2017. Unlike many other event databases, the GTD includes systematic data on domestic as well as international terrorist incidents that have occurred during this time period and now includes more than 181,000 cases.

  • Skills: Analytical Skills · Data Visualization

Nyc-Taxi-Trip-Time-Prediction

Associated with AlmaBetter
August 2022 - August 2022

 - The data is the travel information for the New York taxi. The problem is using the regression method to predict the trip duration depending on the given variables. The variables contain the locations of pickup and drop-off presenting with latitude and longitude, pickup date/time, number of passengers, etc.
- The objective of this project is to predict the total ride duration of taxi trips in New York City.
Discussion of Nyc taxi trip duration will involve various steps such as:
1. Loading the data into the data frame
2. Cleaning the data
3. Extracting statistics from the dataset
4. Exploratory analysis and visualizations
5. Train Test Split
6. Linear Regression
7. Decision tree
8. Random forest
9. decision tree with GridsearchCV
10. Conclusion 


Profile Summary

 As a fresh graduate with a solid background in commerce, I have honed my problem-solving skills for how to grow the business and how to tackle business-related challenges. I am a driven and motivated data science enthusiast with a passion for transforming complex data into meaningful insights. I have a robust foundation in statistics, programming, and data analysis. I am keen to continue developing my skills in this thrilling and rapidly evolving field.

I was captivated by the power of data to tell stories, solve problems, and drive decision-making. Always eager to apply what I have learned to real-world scenarios and to continue learning and growing as a data scientist. I am also a fast learner and always ready to broaden my knowledge and skills.

Thrilled to start a career in data science and to work with a team of professionals who are passionate about using data to make an impact. Keen to contribute skills and enthusiasm to a company that values data-driven decision-making and innovation. 

Personal Details

Full Name Prince Kumar Jha
Gender Male
Marital Status Single
Email ID **********
Mobile No. **********
Date of Birth **********
Languages Known
English(Proficient),
Nationality
India