Pre-Existing Health Conditions

Introduction

This project builds upon Geoffrey’s Milestone I project, which focused primarily on cardiovascular mortality. In this extended analysis, we explore the role of pre-existing conditions and their correlation with cardiovascular mortality while adding an additional focus on COVID-19 mortality. Both heart disease and COVID-19 remain leading causes of death in the United States, with pre-existing health conditions significantly influencing the severity and outcomes of these diseases.

Our goal is to uncover patterns that can inform public health strategies, targeting high-risk groups and identifying modifiable risk factors that may reduce preventable deaths. This analysis has the potential to impact healthcare policies and practices by highlighting the specific vulnerabilities that contribute to mortality.

Methodology

We utilized the CDC's Behavioral Risk Factor Surveillance System (BRFSS) as our primary dataset, supplementing it with data on cardiovascular and COVID-19 mortality. Our methodology integrates both supervised and unsupervised learning techniques. For the supervised component, we employed regression, random forest, and K-Nearest Neighbors models to predict mortality rates by state and year based on selected features. The unsupervised analysis involved Principal Component Analysis (PCA) and k-means clustering to identify natural groupings of conditions associated with higher mortality risks.

Findings

Our supervised models indicate that K-Nearest Neighbors regression can be effective in predicting COVID-19 mortality based on pre-existing conditions. Additionally, the unsupervised clustering reveals patterns of co-occurring conditions that naturally group into clusters of varying mortality risks, offering valuable insights for public health interventions.

Related Works

We compared our findings with existing studies on mortality prediction using machine learning, highlighting the differences and improvements in our approach.

Data Sources

Behavior Risk Factor Survey (BRFSS)
Cardiovascular Mortality Dataset
COVID-19 Mortality Dataset
Population Dataset from the U.S. Census Bureau

Feature Engineering

We performed extensive feature engineering to enhance our datasets, including imputation strategies and data normalization techniques.

Discussion

We learned how various pre-existing conditions and behavioral factors can influence mortality rates. Our analysis provided insights into the effectiveness of different machine learning models and clustering techniques.

Ethical Considerations

We addressed potential ethical issues such as bias in training data and the interpretation of clusters, ensuring our findings are responsibly communicated.

Contributions

Natalie Larowe: Led feature selection and supervised learning models.
Geoffrey Gin: Managed data preprocessing and integration.
Denesh Chandrahasan: Developed unsupervised learning models and clustering analysis.

GitHub Repository

GitHub Logo geoffgin/Pre-Existing-Factors

Exploring the Impact of Pre-Existing Health Conditions on Mortality Rates