Using Digital Data to Predict CHD
Study Details
Study Description
Brief Summary
This project seeks to identify and characterize features derived from digital data (e.g. social media, online search, mobile media) which are associated with coronary heart disease (CHD) and related risk factors, and develop models that use digital data and conventional predictive models to predict CHD risk and health care utilization.
Condition or Disease | Intervention/Treatment | Phase |
---|---|---|
|
Detailed Description
Cardiovascular disease is the leading cause of death in the US. While secondary prevention approaches have improved longevity of patients, risk factors and adverse health behaviors (e.g., physical inactivity, smoking) are highly prevalent, and in most contemporary series, less than 1% of adults meet all factors of ideal CV health. The logistics and practicalities of meeting the goal of ideal CV health have not been clearly elucidated. Practice guidelines recommend using the Framingham risk score (FRS) or other risk prediction tools to classify patients' risk of CV disease. These models however are imprecise and there is increasing focus on identifying markers that provide better measures of risk. As digital platforms are increasingly used to document lifestyle and health behaviors, data from digital sources may provide a window into manifestations of novel risk factors and potentially a better characterization of existing risk factors. While it seems like a cliche to mention the profound impact of digital data on everyday lives, there is indeed great substance in the opportunities these new media provide for understanding behavioral, social, and environmental determinants of health. This project seeks to identify and characterize features derived from digital data (e.g. social media, online search, mobile media) which are associated with coronary heart disease (CHD) and related risk factors, and develop models that use digital data and conventional predictive models to predict CHD risk and health care utilization.
Study Design
Arms and Interventions
Arm | Intervention/Treatment |
---|---|
Case Patients ages 40-74 with and without CHD (IICD 10: I63, I20-I25 ) within the last 6 months who receive care in the University of Pennsylvania Health System (UPHS). |
Other: Survey
Interested participants may complete the informed consent online. After informed consent, the participant will be asked to share the digital data types that they use (Facebook, Instagram, Twitter, Google search, step data) and then participants will complete a cross-sectional survey.
|
Control Patients aged 40-74 who have non-cardiovascular-related chief compliant. |
Other: Survey
Interested participants may complete the informed consent online. After informed consent, the participant will be asked to share the digital data types that they use (Facebook, Instagram, Twitter, Google search, step data) and then participants will complete a cross-sectional survey.
|
Outcome Measures
Primary Outcome Measures
- Latent Dirichlet Allocation Topics - topics / themes discussed between patients with and without heart disease [Through study completion, an average of 3 years]
The primary outcome is topics and features (derived using the Latent Dirichlet Allocation [LDA] method for clustering language data).
Other Outcome Measures
- CHD event [Through study completion, an average of 3 years]
Reliability in predicting CHD related event in patient as measured by Framingham Risk Score. The Framingham Risk Score (FRS) is a validated means of predicting cardiovascular disease (CVD) risk. Input variables include age, cigarette smoking, total cholesterol, HDL cholesterol, systolic blood pressure measurement and treatment for hypertension. Point values are calculated based on each of these risks. A 10-year risk score can be derived as a percentage. Risk scores range from 0-20%. Low Risk: Less than 10% risk that you will develop a heart attack or die from coronary disease in the next 10 years. Intermediate risk: A 10 to 20% risk that you will develop a heart attack or die from coronary disease in the next 10 years. High Risk: A greater than 20% risk that you will develop a heart attack or die from coronary disease in the next 10 years.
- Health care utilization [Through study completion, an average of 3 years]
Prediction of cost for health care utilization between heart disease and non- heart disease subjects measured by insurance claims data
Eligibility Criteria
Criteria
Inclusion Criteria:
-
40 - 74 years of age
-
Willing to sign informed consent
-
Primarily English speaking (for language analysis)
-
Has an account on any of the following digital data platforms (Facebook, Instagram, Twitter Reddit, Google (gmail), or smartphone or wearable device such as Apple Health, Fitbit, Samsung Health, MapMyFitness or Garmin) and willing to share data
-
If has social media account, Instagram or Facebook, willing to share historical and prospective data (60 days) If has Google (gmail) account, willing to download and share google takeout zip file
-
If has smartphone or wearable device, willing to share step data
-
Willing to share access to medical health records
-
Willing to share healthcare insurance information
Exclusion Criteria:
-
Patient does not meet age inclusion criteria above
-
Does not use and post on digital data sources we are studying or unwilling to donate data
-
Patient is in severe distress, e.g. respiratory, physical, or emotional distress
-
Patient is intoxicated, unconscious, or unable to appropriately respond to questions
Contacts and Locations
Locations
Site | City | State | Country | Postal Code | |
---|---|---|---|---|---|
1 | University of Pennsylvania Health System | Philadelphia | Pennsylvania | United States | 19101 |
Sponsors and Collaborators
- University of Pennsylvania
Investigators
None specified.Study Documents (Full-Text)
None provided.More Information
Publications
None provided.- 833699