Lifestyle Factors and What They Tell Us About Sleep Health

Author

Vivian Sui, Ryan Orth, Dwayne Heraclio

Dataset Description

In this analysis, we examine the “Sleep Health and Lifestyle” dataset from Kaggle. This is a simulated dataset that contains 400 rows and 13 variables, where each row represents one individual.

The following variables are included in the dataset:

  • Person ID: a unique identifier for each subject.
  • Gender: the gender of the subject (male/female).
  • Age: the age of the subject in years.
  • Occupation: the occupation of the subject.
  • Sleep Duration: the number of hours the subject sleeps per day.
  • Quality of Sleep: a self-reported rating of the quality of sleep the subject experiences, on a scale of 1 to 10.
  • Physical Activity Level: the number of minutes per day that the subject engages in physical activity.
  • Stress Level: a self-reported rating of the stress level the subject experiences, on a scale of 1 to 10.
  • BMI Category: the BMI category of the subject (e.g. normal, overweight, obese).
  • Blood Pressure: The blood pressure of the subject, measured as systolic pressure over diastolic pressure.
  • Heart Rate: The resting heart rate of the subject, measured in beats per minute.
  • Daily Steps: The number of steps the subject takes per day.
  • Sleep Disorder: The presence or absence of a sleep disorder in the subject (None, Sleep Apnea, Insomnia).

Research Questions

We will be investigating three research questions, all aiming to understand what factors influence an individual’s sleep experience:

  1. How does physical health impact sleep health?
  2. How does age and occupation affect sleep health?
  3. How does gender influence sleep health?

How does physical health impact sleep health?

In order to explore the relationship between physical health and sleep health, we sought attributes that had the strongest relationship to the sleep health metrics Sleep.Duration and Quality.of.Sleep. In particular, the variables of interest from the dataset are Age, Physical.Activity.Level, BMI.Category, Blood.Pressure (converted to Systolic and Diastolic), Heart.Rate, and Daily.Steps. All of these variables represent an attribute of physical health such as age, exercise, weight, and cardiovascular health.

How does BMI Category relate to Quality of Sleep?

Since we wanted to see how physical health impacted sleep health, we chose to examine the relationship between BMI.Category and Quality.of.Sleep as weight is one of the most focused on attributes of the health and lifestyle industry. If we could determine a association between BMI.Category and Quality.of.Sleep, it would suggest that weight is tied to one’s quality of sleep. Therefore, we created a ridgeline plot in order to see the distributions for Quality.of.Sleep given each BMI Category.

The density ridge plot demonstrates that there are clear differences in the distribution of sleep quality among subjects from each BMI Category. Normal BMI subjects have a mode of 8 out of 10 quality of sleep, Overweight BMI subjects have a mode of 6 out of 10 and Obese BMI subjects have a mode of 7 out of 10 quality of sleep. Interestingly, the Overweight density curve is bimodal and has a large density of subjects at a 9 out of 10 quality of sleep. Furthermore, the Obese density curve is trimodal and has a large density of subjects under a 4 out of 10 quality of sleep and a not as large density under a 9 out of 10 quality of sleep. It is important to note that there are significantly less Obese subjects than the other two categories. However, it is still interesting that Normal subjects have a unimodal distribution centered at a high average quality of sleep, while Overweight and Obese subjects have bimodal and trimodal distributions respectively. This demonstrates that there is large variation among the unhealthy BMI categories with a non-trivial portion of their subjects achieving higher quality of sleep than Normal subjects.

In order to further explore this relationship, we also performed a KS-Test on the three distributions, performing three pairwise comparisons in order to determine whether the distributions of Quality.of.Sleep for each BMI Category differed from each other to a statistically significant degree. After performing each pairwise comparison and compensating the alpha level, we concluded that there is a statistically significant different in the distributions of Quality.of.Sleep for Obese BMI and Overweight BMI subjects in comparison to Normal BMI subjects, but not one another. This further supports the notion that the Obese and Overweight distributions differ significantly from the Normal BMI distribution indicated by the plot itself. Our primary takeaway from this plot is that the distributions for Overweight BMI and Obese BMI subjects are centered around lower Quality.of.Sleep values in comparison to the distribution for Normal BMI subjects.

Which physical health attribute is the best predictor of sleep health?

After examining the relationship between BMI.Category and Quality.of.Sleep, we next sought to identify which physical health measures best correlate with sleep health metrics (Quality.of.Sleep and Sleep.Duration). To do this, we performed a principal component analysis on all of the quantitative physical health variables and created a PCA Biplot, coloring each point by BMI category so we could see how the original health attributes load onto those dimensions and which combinations of attributes tend to characterize each BMI group.

The PCA Biplot displays the first two principal components and how they are associated with the individual points and variables. As indicated by the plot, Normal BMI subjects are highly correlated with Sleep.Duration and Quality.of.Sleep, suggesting that these subjects exhibit the highest measurements of these metrics. Although somewhat scattered in the plot, Overweight BMI subjects are associated with higher blood pressure, physical activity and daily steps, and age. Lastly, Obese BMI subjects are highly associated with Heart Rate, suggesting that these subjects exhibit high heart rates. Of the quantitative variables, the plot shows little to no correlation between the sleep metrics and Age, Physical.Activity.Level, Daily.Steps, and Blood.Pressure (as indicated by near 90 degree angle with the sleep metrics and these physical health variables). Interestingly, the plot shows that Sleep.Duration and Quality.of.Sleep are negatively correlated with Heart.Rate, especially Quality.of.Sleep. This resulted in the main takeaway of this graph, which suggests that higher resting heart rate is indicative of poor sleep quality.

Heart Rate is the strongest predictor of sleep duration

As a result of the findings from the PCA Biplot, we then wanted to find a more precise measurement of the correlation between Heart.Rate and Sleep.Duration. To do this, we created a scatterplot of Sleep.Duration vs. Heart Rate with a linear smoother to gain a better understanding of the predictive power of Heart.Rate.

The scatterplot and regression line show a negative relationship between Sleep.Duration and Heart.Rate, similarly to what the PCA Biplot indicated. There are some outliers toward the higher heart rate values, but the overall trend shows a clear negative association between the two variables. After creating the model, we also examined model output and found the reported Adjusted R-Squared score to be 0.2648. Although this is a relatively low score, it does reveal that 0.2648 of the sample variation in Sleep.Duration for the in-sample data is a result of Heart.Rate. Furthermore, the associated p-value of the model is approximately zero, demonstrating that Heart.Rate is a statistically significant predictor of Sleep.Duration. Our primary takeaway here is that Heart.Rate is negatively correlated to sleep health metrics, and serves as a significant predictor of sleep duration.

Summary and Future Work

In order to explore how physical health relates to sleep health, we first plotted Quality.of.Sleep distributions by BMI.Category and conducted pairwise KS tests, then performed PCA on quantitative health metrics to examine variable loadings, and finally fitted a linear regression model between Heart.Rate and Sleep.Duration to determine predictive strength. Our primary takeaways were that Overweight and Obese BMI individuals display more spread out and generally lower sleep quality compared to Normal BMI subjects, and that resting heart rate emerged as the strongest predictor of sleep duration. For future work, more causation focused questions such as whether elevated heart rate causes poor sleep quality or increases the risk of sleep disorders should be explored. We would also be interested in exploring how Physical.Activity.Level may influence other sleep health metrics as has been well documented in scientific literature. However, our current dataset is too limited to support simulation or row resampling methods without potentially completely excluding the Obese subject portion of the sample.

How does age and occupation affect sleep health?

To explore how age and occupation affected sleep health, we sought to investigate whether certain occupations were associated with specific sleep disorders and how age influences sleep quality.

How does occupation relate to sleeping disorders?

To investigate the distribution of Sleep.Disorder across various Occupation, we created a rose diagram that displays the frequency of each sleep disorder type (None, Insomnia, and Sleep Apnea) within each occupation. This type of plot allows for an intuitive visual comparison of categorical proportions in a circular layout, which highlights disparities in sleep health by professional role.

From the rose diagram, we observe that the majority of individuals across most occupations report no sleep disorder (gray segments), but there are several notable deviations. Nurses appear to have the highest concentration of sleep apnea (red), while sales people and teachers show relatively high proportions of insomnia (blue). Alternatively, occupations such as engineer, doctor, and lawyer have a larger share of individuals without sleep disorders. Therefore, this graph supports the conclusion that Occupation is associated with the prevalence of sleep disorders, suggesting that factors tied to one’s profession, such as work stress, shift schedules, or physical demands, may have confounding influences on sleep health outcomes. While this graph doesn’t include Age or other variables, it sets the stage for further analysis by showing which professions may need further investigation to find the underlying cause of these disorders.

Summary and Future Work

Overall, these visualizations support the idea that age and sleep quality are intertwined with sleep disorder prevalence, and highlight how certain disorders are more prevalent within specific age ranges and occupations.

Further analysis including individuals from a broader range of professions could deepen our understanding of how occupation relates to sleep disorder. For instance, our current dataset lacks representation from careers involving extensive manual labor, which may experience different sleep health challenges due to physical demands. We were unable to include these professions in our current analysis because the dataset we used primarily focuses on white-collar occupations, limiting the scope of careers we were able to include.

How does gender influence sleep health?

To investigate how gender influences sleep health, we explored the prevalence of diagnosed sleep disorders and differences in sleep-related behaviors.

How are sleep disorders distributed per gender?

First, we examined the distribution of sleep disorder by gender.

The bar plot displays the different types of sleep disorders along the x-axis and the count of individuals suffering from each disorder along the y-axis, with bars stacked by gender. From this plot, we observed that more females suffer from Sleep Apnea compared to males, while Insomnia is more evenly distributed between genders. Additionally, a greater proportion of males reported no sleep disorder compared to females.

To formally assess whether Sleep.Disorder is independent of Gender, we performed a chi-squared test of independence. The test yielded a p-value of approximately equal to zero, indicating that there is evidence of an association between Sleep.Disorder and Gender. These results suggest that women are disproportionately affected by sleep disorders compared to men.

How does gender relate to stress, exercise, and sleep metrics?

We also examined how Sleep.Duration, Quality.of.Sleep, Stress.Level, and Physical.Activity.Level relate to each other by gender through a pairs plot.

The pairs plot allows all pairwise relationships between the variables of interest to be seen in one visualization in the form of scatterplots, densities, and correlations. The pairs plot revealed that females show a stronger positive correlation between Sleep.Duration and Quality.of.Sleep, suggesting that obtaining sufficient sleep may be more important for females to maintain a high quality of sleep. Furthermore, males exhibit a slightly stronger negative correlation between Stress.Level and Quality.of.Sleep compared to females. Notably, the relationship between Physical.Activity.Level and Quality.of.Sleep differs significantly by gender, with males demonstrating a strong positive correlation while females show a slight negative correlation. This means that while increases in physical activity are typically associated with higher quality sleep for males, females show the opposite.

Summary and Future Work

Overall, these analyses reveal significant gender differences in the distribution of sleep disorders and show that relationships between variables such as Quality.of.Sleep, Sleep.Duration, Physical.Activity, and Stress.Level differ across genders. While these findings are correlational, they highlight important areas for further research and suggest that recognizing gender-specific patterns may be key to understanding individual sleep health. In the future, we recommend investigating how additional gender-related factors, such as caregiving responsibilities or hormonal differences, might impact sleep health. Due to data limitations we were unable to assess their influence in the current analysis.

Conclusion

Our analyses revealed key findings about what factors influence an individual’s sleep experience:

  1. Physical health significantly impacts sleep health, as overweight and obese individuals tend to have lower sleep quality compared to those with normal BMI. Additionally, heart is one of the strongest predictors of sleep health, showing a strong negative correlation with both sleep duration and sleep quality.
  2. Certain occupations, like nursing and sales, have higher rates of sleep disorders, suggesting that job-related factors influence sleep health beyond just age. Additionally, older individuals are more likely to experience sleep disorders, particularly sleep apnea, suggesting that age can be a contributing factor to sleep health outcomes.
  3. Females are more likely to experience sleep disorders than males, and gender differences emerge in how sleep quality relates to sleep duration, stress levels, and physical activity.

While these findings offer a valuable starting point for understanding sleep health, it is important to recognize the limitations of this analysis, as the results are based on simulated data. In future work, we’d recommend investigating:

  • Are these results generalizable to the general population?

  • Can any causal structures be identified?

Replicating this study with real, randomized population samples would be necessary to answer these questions. Due to time and resource constraints, we were unable to pursue this approach and leave it as a recommendation for future research.

Acknowledgements

  1. The Sleep Health and Lifestyle Dataset was synthetic and created for illustrative purposes.
  2. BMI has many documents flaws as a measurement, but is the only provided metric for attributes such as weight and body fat.