Prediction of Heart Failure using machine learning with Project

Sunil Nagar
8 min readJun 11, 2023

In today’s fast-paced world, people often prioritize their daily responsibilities and neglect their health, leading to a rise in various illnesses. Among them, heart disease has emerged as a major concern, with approximately 31% of global deaths attributed to heart-related conditions, according to the World Health Organization (WHO). Therefore, it becomes crucial to predict the occurrence of heart disease and take proactive measures for prevention and timely intervention.

The medical sector and hospitals generate a vast amount of data, which can sometimes be challenging to analyze manually. However, leveraging the power of machine learning techniques for predictive analysis and data handling can significantly enhance efficiency for healthcare professionals. This study aims to explore heart disease and its risk factors while utilizing machine learning techniques to predict its occurrence. Furthermore, a comparative analysis of different machine learning algorithms will be conducted to evaluate their performance in heart disease prediction.

The primary objective of this research is to develop an accurate prediction model for heart disease using machine learning. By harnessing the capabilities of advanced algorithms, this study seeks to provide medical practitioners with a valuable tool for early detection and proactive management of heart disease. Comparative analysis of machine learning algorithms will offer insights into the strengths and weaknesses of each approach, enabling informed decision-making regarding the most effective prediction model.

By combining medical expertise with machine learning techniques, we aim to contribute to the field of healthcare by improving heart disease prediction and ultimately reducing the impact of this prevalent condition on individuals and society as a whole.

Objective:

The main objective of this project is to build a robust predictive model that can effectively identify individuals at high risk of heart failure. By analyzing historical data and leveraging machine learning algorithms, the model will be trained to predict the likelihood of heart failure based on input features such as age, gender, medical history, blood pressure, cholesterol levels, and lifestyle factors. The ultimate goal is to improve patient outcomes by enabling early intervention and personalized healthcare management.

Outline:

  1. Introduction
  • Overview of Heart Failure and its Impact on Public Health
  • Importance of early detection and proactive management
  • Role of machine learning in predicting heart failure

2. Data Collection

  • Identify and gather a comprehensive dataset containing relevant features
  • Explore medical records, clinical databases, and publicly available datasets
  • Ensure data privacy and ethical considerations

3. Data Preprocessing

  • Handle missing values, duplicates, and inconsistencies in the dataset
  • Perform feature engineering to extract meaningful information
  • Normalize or scale numerical features
  • Encode categorical variables

4. Exploratory Data Analysis

  • Perform statistical analysis and visualization to gain insights into the data
  • Examine correlations between features and heart failure
  • Identify any patterns or trends in the data

5. Feature Selection

  • Select the most relevant features for heart failure prediction
  • Utilize techniques like correlation analysis, feature importance, and domain knowledge
  • Remove irrelevant or redundant features

6. Model Selection and Training

  • Choose suitable machine learning algorithms for prediction
  • Split the dataset into training and testing sets
  • Train the models using various algorithms (e.g., logistic regression, random forests, support vector machines)
  • Use appropriate evaluation metrics to assess model performance

7. Model Evaluation and Comparison

  • Evaluate trained models using metrics such as accuracy, precision, recall, and AUC-ROC curve
  • Compare the performance of different models to identify the most accurate and reliable one
  • Consider factors like interpretability, computational complexity, and scalability

8. Model Deployment

  • Deploy the selected model into a production environment
  • Develop an API or integrate the model into a web application or healthcare system
  • Ensure real-time predictions and handle input data securely

9. Performance Monitoring and Improvement

  • Continuously monitor the model’s performance in the production environment
  • Collect feedback from healthcare professionals and end-users
  • Update and retrain the model as new data becomes available
  • Fine-tune hyperparameters and consider ensemble techniques for further improvements

10. Conclusion

  • Recap project objectives and achievements
  • Highlight the significance of the predictive model for the early detection of heart failure
  • Discuss potential future enhancements and extensions of the project

Here are some of the most commonly used machine learning algorithms for predicting heart failure:

  • Logistic regression is a simple but effective algorithm that can be used to predict binary outcomes, such as whether or not a patient will develop heart failure.
  • Support vector machines (SVMs) are powerful algorithms that can be used to predict both binary and continuous outcomes.
  • Neural networks are more complex algorithms that can learn complex relationships between features and outcomes.

Introduction

The prediction of heart failure using machine learning techniques is a crucial application in healthcare. This project aims to develop a predictive model that can accurately predict the likelihood of an individual experiencing heart failure based on various risk factors and medical indicators. By leveraging machine learning algorithms and a dataset containing relevant features, this project aims to assist healthcare professionals in the early detection and proactive management of heart failure.

Heart, as a vital organ of the human body, plays a crucial role in pumping blood to every part of our anatomy. Its proper functioning is essential for the brain and various other organs to operate effectively. When the heart fails to function correctly, the person’s life is at immediate risk, as the brain and organs will cease to work within minutes. Unfortunately, changes in lifestyle, work-related stress, and poor dietary habits have contributed to an alarming increase in heart-related diseases.

Heart diseases have emerged as one of the leading causes of death worldwide. According to the World Health Organization (WHO), these diseases claim the lives of 17.7 million people annually, accounting for 31% of all global deaths. In India, heart-related diseases have become the primary cause of mortality, with 1.7 million deaths recorded in 2016, as reported by the 2016 Global Burden of Disease Report.

The impact of heart diseases extends beyond loss of life. It significantly burdens healthcare systems and reduces individuals’ productivity. The WHO estimates that India has incurred losses of up to $237 billion between 2005 and 2015 due to cardiovascular diseases. Therefore, accurate and feasible prediction of heart-related diseases is of utmost importance.

Medical organizations worldwide collect vast amounts of data on various health-related issues, including heart diseases. These datasets hold immense potential for gaining valuable insights when analyzed using machine learning techniques. However, the sheer size and complexity of these datasets can be overwhelming for human minds to comprehend. This is where machine learning algorithms come into play, providing effective tools to explore and extract meaningful patterns and predictions from the data.

By leveraging machine learning techniques, medical professionals can accurately predict the presence or absence of heart-related diseases. These algorithms have proven to be invaluable in handling large, noisy datasets, enabling more accurate diagnoses and timely interventions. With the ability to process vast amounts of data, machine learning algorithms have become indispensable tools in the quest for better understanding and prediction of heart diseases.

In conclusion, the prediction of heart-related diseases is paramount in the field of healthcare. The use of machine learning techniques allows medical organizations to harness the power of data and gain valuable insights for improved diagnoses, prevention, and treatment. By combining medical expertise with advanced technology, we can strive to reduce the impact of heart disease, enhance patient outcomes, and promote overall well-being in communities worldwide.

References:

[1] World Health Organization (WHO)

[2] Global Burden of Disease Report (2016)

Prediction of the Occurrence of Heart Failure

This project aims to predict the occurrence of heart failure through multiple classificational algorithms.

Data Import and Exploration

Dataset

We will use the Heart Failure Prediction dataset available on Kaggle (https://www.kaggle.com/andrewmvd/heart-failure-clinical-data). The dataset contains 299 records with 13 features, including age, gender, and various medical indicators. The target variable is a binary classification of whether the patient experienced heart failure or not.

# import what we need here
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as st
import os
import time

# the data source
# the corresponding file is available at https://www.kaggle.com/datasets/ineubytes/
# If you use google colab, PLEASE put the corresponding csv dataset into the root d
# The file will be deleted everytime in google colab!!! And you might use additiona
# If you use jupyter lab, make sure that you set the directory to the place where t
# os.getcwd()
# os.chdir('your directory goes here')
df = pd.read_csv('heart.csv')
# explore data
df.head()
age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal
0 52 1 0 125 212 0 1 168 0 1.0 2 2 3
1 53 1 0 140 203 1 0 155 1 3.1 0 0 3
2 70 1 0 145 174 0 1 125 1 2.6 0 0 3
3 61 1 0 148 203 0 1 161 0 0.0 2 1 3
4 62 0 0 138 294 1 1 106 0 1.9 1 3 2

Heart Failure Prediction

In [1]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")


import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))
/kaggle/input/heart-failure-clinical-data/heart_failure_clinical_records_dataset.csv

In [2]:

data=pd.read_csv("/kaggle/input/heart-failure-clinical-data/heart_failure_clinical_records_dataset.csv")

In [3]:

data.head()

Out[3]:

ageanaemiacreatinine_phosphokinasediabetesejection_fractionhigh_blood_pressureplateletsserum_creatinineserum_sodiumsexsmokingtimeDEATH_EVENT075.005820201265000.001.91301041155.0078610380263358.031.11361061265.001460200162000.001.31291171350.011110200210000.001.91371071465.011601200327000.002.71160081

In [4]:

data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 299 entries, 0 to 298
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 age 299 non-null float64
1 anaemia 299 non-null int64
2 creatinine_phosphokinase 299 non-null int64
3 diabetes 299 non-null int64
4 ejection_fraction 299 non-null int64
5 high_blood_pressure 299 non-null int64
6 platelets 299 non-null float64
7 serum_creatinine 299 non-null float64
8 serum_sodium 299 non-null int64
9 sex 299 non-null int64
10 smoking 299 non-null int64
11 time 299 non-null int64
12 DEATH_EVENT 299 non-null int64
dtypes: float64(3), int64(10)
memory usage: 30.5 KB

In [5]:

data.describe()

Out[5]:

ageanaemiacreatinine_phosphokinasediabetesejection_fractionhigh_blood_pressureplateletsserum_creatinineserum_sodiumsexsmokingtimeDEATH_EVENTcount299.000000299.000000299.000000299.000000299.000000299.000000299.000000299.00000299.000000299.000000299.00000299.000000299.00000mean60.8338930.431438581.8394650.41806038.0836120.351171263358.0292641.39388136.6254180.6488290.32107130.2608700.32107std11.8948090.496107970.2878810.49406711.8348410.47813697804.2368691.034514.4124770.4781360.4676777.6142080.46767min40.0000000.00000023.0000000.00000014.0000000.00000025100.0000000.50000113.0000000.0000000.000004.0000000.0000025%51.0000000.000000116.5000000.00000030.0000000.000000212500.0000000.90000134.0000000.0000000.0000073.0000000.0000050%60.0000000.000000250.0000000.00000038.0000000.000000262000.0000001.10000137.0000001.0000000.00000115.0000000.0000075%70.0000001.000000582.0000001.00000045.0000001.000000303500.0000001.40000140.0000001.0000001.00000203.0000001.00000max95.0000001.0000007861.0000001.00000080.0000001.000000850000.0000009.40000148.0000001.0000001.00000285.0000001.00000

In [6]:

data.columns

Out[6]:

Index(['age', 'anaemia', 'creatinine_phosphokinase', 'diabetes',
'ejection_fraction', 'high_blood_pressure', 'platelets',
'serum_creatinine', 'serum_sodium', 'sex', 'smoking', 'time',
'DEATH_EVENT'],
dtype='object')

In [7]:

#missing value

data.isnull().sum()

Out[7]:

age                         0
anaemia 0
creatinine_phosphokinase 0
diabetes 0
ejection_fraction 0
high_blood_pressure 0
platelets 0
serum_creatinine 0
serum_sodium 0
sex 0
smoking 0
time 0
DEATH_EVENT 0
dtype: int64

In [8]:

data["DEATH_EVENT"].value_counts()

Out[8]:

0    203
1 96
Name: DEATH_EVENT, dtype: int64

In [9]:

#unique value analysis

for i in list(data.columns):
print("{}->{}".format(i,data[i].value_counts().shape[0]))
age->47
anaemia->2
creatinine_phosphokinase->208
diabetes->2
ejection_fraction->17
high_blood_pressure->2
platelets->176
serum_creatinine->40
serum_sodium->27
sex->2
smoking->2
time->148
DEATH_EVENT->2

In [10]:

#categorical feature analysis

categorical_list=["anaemia","diabetes","high_blood_pressure","sex","smoking","DEATH_EVENT"]

In [11]:

import matplotlib.pyplot as plt
import seaborn as sns
    data_categoric = data.loc[:, categorical_list]    fig, axs = plt.subplots(ncols=len(categorical_list), figsize=(20,5))    for i, col in enumerate(categorical_list):
sns.countplot(x=col, data=data_categoric, hue="DEATH_EVENT", ax=axs[i])
axs[i].set_title(col)
plt.tight_layout()
plt.show()

In [12]:

#numeric feature analysis

numeric_list=["age", "creatinine_phosphokinase",
"ejection_fraction", "platelets",
"serum_creatinine", "serum_sodium", "time","DEATH_EVENT"]

In [13]:

data_numeric = data.loc[:, numeric_list]
sns.pairplot(data_numeric, hue = "DEATH_EVENT", diag_kind = "kde")
plt.show()

In [14]:

#standardization

from sklearn.preprocessing import StandardScaler
scaler=StandardScaler()
scaled_array=scaler.fit_transform(data[numeric_list[:-1]])

In [15]:

pd.DataFrame(scaled_array).describe()

Out[15]:

0123456count2.990000e+02299.0000002.990000e+022.990000e+022.990000e+022.990000e+022.990000e+02mean5.703353e-160.000000–3.267546e-177.723291e-171.425838e-16–8.673849e-16–1.901118e-16std1.001676e+001.0016761.001676e+001.001676e+001.001676e+001.001676e+001.001676e+00min-1.754448e+00–0.576918–2.038387e+00–2.440155e+00–8.655094e-01–5.363206e+00–1.629502e+0025%-8.281242e-01–0.480393–6.841802e-01–5.208700e-01–4.782047e-01–5.959961e-01–7.389995e-0150%-7.022315e-02–0.342574–7.076750e-03–1.390846e-02–2.845524e-018.503384e-02–1.969543e-0175%7.718891e-010.0001665.853888e-014.111199e-015.926150e-037.660638e-019.387595e-01max2.877170e+007.5146403.547716e+006.008180e+007.752020e+002.582144e+001.997038e+00

In [16]:

#correlation analysis

f,ax=plt.subplots(figsize = (8,5))
sns.heatmap(data.corr(), annot = True, fmt = ".1f", linewidths = .5,ax=ax)
plt.show()

In [17]:

#encoding categorical columns

data1=data.copy()
data1= pd.get_dummies(data1, columns = categorical_list[:-1], drop_first = True)
data1.head()

Out[17]:

agecreatinine_phosphokinaseejection_fractionplateletsserum_creatinineserum_sodiumtimeDEATH_EVENTanaemia_1diabetes_1high_blood_pressure_1sex_1smoking_1075.058220265000.001.91304100110155.0786138263358.031.11366100010265.014620162000.001.31297100011350.011120210000.001.91377110010465.016020327000.002.71168111000

In [18]:

x_data = data1.drop(["DEATH_EVENT"], axis = 1)
y = data1.DEATH_EVENT.values

--

--

Sunil Nagar

Blogger: #Artificial Intelligence #data #chatbot #Automation #UI #frontend #CMS #WordPress #Web Development #business analyst #Product Develpoment