An in-depth analysis of the role of pre-existing conditions in cardiovascular and COVID-19 mortality.
This project builds upon Geoffrey’s Milestone I project, which focused primarily on cardiovascular mortality. In this extended analysis, we explore the role of pre-existing conditions and their correlation with cardiovascular mortality while adding an additional focus on COVID-19 mortality. Both heart disease and COVID-19 remain leading causes of death in the United States, with pre-existing health conditions significantly influencing the severity and outcomes of these diseases.
Our goal is to uncover patterns that can inform public health strategies, targeting high-risk groups and identifying modifiable risk factors that may reduce preventable deaths. This analysis has the potential to impact healthcare policies and practices by highlighting the specific vulnerabilities that contribute to mortality.
We utilized the CDC's Behavioral Risk Factor Surveillance System (BRFSS) as our primary dataset, supplementing it with data on cardiovascular and COVID-19 mortality. Our methodology integrates both supervised and unsupervised learning techniques. For the supervised component, we employed regression, random forest, and K-Nearest Neighbors models to predict mortality rates by state and year based on selected features. The unsupervised analysis involved Principal Component Analysis (PCA) and k-means clustering to identify natural groupings of conditions associated with higher mortality risks.
Our supervised models indicate that K-Nearest Neighbors regression can be effective in predicting COVID-19 mortality based on pre-existing conditions. Additionally, the unsupervised clustering reveals patterns of co-occurring conditions that naturally group into clusters of varying mortality risks, offering valuable insights for public health interventions.
We compared our findings with existing studies on mortality prediction using machine learning, highlighting the differences and improvements in our approach.
We performed extensive feature engineering to enhance our datasets, including imputation strategies and data normalization techniques.
We learned how various pre-existing conditions and behavioral factors can influence mortality rates. Our analysis provided insights into the effectiveness of different machine learning models and clustering techniques.
We addressed potential ethical issues such as bias in training data and the interpretation of clusters, ensuring our findings are responsibly communicated.