Project ideas

Published

September 23, 2024

Note

These ideas are originally generated by GenAI and curated for appropriateness

Final Project Ideas:

Review Paper ideas:

  1. Critical analysis of a recent paper on biomarker discovery for a specific disease.
  2. Evaluate the strengths and weaknesses of a published study using machine learning for disease prediction.
  3. Compare and contrast two different approaches to causal inference in epidemiology.
  4. Analyze the ethical considerations of using AI in healthcare decision-making.
  5. Review a research paper exploring the explainability of deep learning models in bioinformatics.

Data-Based Project ideas:

  1. Analyze a large dataset to identify risk factors for a specific disease. (e.g., National Health and Nutrition Examination Survey (NHANES))
  2. Develop a machine learning model for predicting patient survival using clinical trial data.
  3. Compare gene expression patterns across different disease states. (e.g., The Cancer Genome Atlas (TCGA))
  4. Model the spread of an infectious disease using publicly available data.
  5. Investigate the relationship between environmental factors and disease prevalence.

Integrative Project (Combine Review & Data Analysis) ideas:

  1. Replicate and evaluate the methods used in a published paper on a different dataset.
  2. Develop a proposal for a data-driven intervention to improve public health in a specific area.
  3. Compare and contrast the strengths and weaknesses of different statistical methods for analyzing survival data.
  4. Investigate the role of specific genes in disease progression using both literature review and pathway analysis tools.
  5. Design a study to validate the effectiveness of a proposed biomarker for disease diagnosis.

Note: This is just a starting point. Students are encouraged to discuss their project ideas with the instructor for guidance and approval.

Project ideas (for data-based projects)

This is a non-exclusive and non-exhaustive set of ideas for projects. Your project proposals will have to have more specifics around the actual project you want to do. These lists are generated by various GenAI tools and then curated.

When limiting to publicly accessible data sources, students can use open datasets that are freely available for research purposes. Here are ten project topics with corresponding public data sources. Students should verify the terms of use for each data source to ensure compliance with any restrictions on use. Accessing some data sets (even those not included here) might require creating free accounts in order to comply with access requirements.

  1. Project Topic: Comparative Analysis of Obesity Rates Across Different Countries
  • Data Source: WHO Global Health Observatory Data Repository (http://www.who.int/gho/database/en/)
  1. Project Topic: Predicting Heart Disease from Clinical and Socioeconomic Features
  • Data Source: UCI Machine Learning Repository, specifically the Heart Disease dataset (https://archive.ics.uci.edu/ml/datasets/heart+disease)
  1. Project Topic: Analysis of Microbiome Data to Understand Gut Health
  • Data Source: Human Microbiome Project Data (https://www.hmpdacc.org/)
  1. Project Topic: Genomic Data Analysis to Identify Variants Associated with Diabetes
  • Data Source: 1000 Genomes Project (http://www.internationalgenome.org/)
  1. Project Topic: Machine Learning for Skin Cancer Classification from Image Data
  • Data Source: ISIC Archive - Dermoscopic Images (https://www.isic-archive.com/)
  1. Project Topic: Evaluating Public Health Interventions on Seasonal Influenza Spread
  • Data Source: FluNet provided by WHO (https://www.who.int/influenza/gisrs_laboratory/flunet/en/)
  1. Project Topic: Assessing the Impact of Lifestyle Choices on Longevity Using Survey Data
  • Data Source: NHANES (National Health and Nutrition Examination Survey) (https://www.cdc.gov/nchs/nhanes/index.htm)
  1. Project Topic: Identifying Trends in Drug Prescription Practices Over Time
  • Data Source: Medicare Part D Prescriber Public Use File (PUF) (https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/PartD2017)
  1. Project Topic: Network Analysis of Gene Expression Data in Breast Cancer
  • Data Source: Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/)
  1. Project Topic: Developing a Predictive Model for Zika Virus Outbreaks
    • Data Source: PAHO (Pan American Health Organization) Zika Cumulative Cases Data (https://www.paho.org/hq/index.php?option=com_topics&view=article&id=427&Itemid=41484&lang=en)
  2. Predicting Hospital Readmission Risk (Supervised Learning, Survival Analysis):
  1. Analyzing Gene Expression Patterns in Cancer (Unsupervised Learning, Network Analysis):
  1. Exploring Factors Affecting Patient Satisfaction with Telemedicine (Supervised Learning, Visualization):
  1. Predicting Drug-Target Interactions for Drug Discovery (Supervised Learning):
  1. Analyzing Trends in Global Healthcare Expenditures (Time Series Analysis, Visualization):
  • Goal: Identify long-term trends and factors influencing global healthcare expenditures.
  • Data: World Health Organization (WHO) Global Health Expenditure Database (https://apps.who.int/nha/database)
  1. Analyzing Social Media Data to Understand Public Perception of Vaccines (Natural Language Processing, Sentiment Analysis):
  1. Identifying Risk Factors for Adverse Events in Clinical Trials (Supervised Learning, Survival Analysis):
  • Goal: Develop a model to predict patients at higher risk of adverse events during clinical trials.
  • Data: ClinicalTrials.gov database (https://clinicaltrials.gov/)
  1. Identifying Outliers and Data Quality Issues in Large Biomedical Datasets (Unsupervised Learning):
  • Goal: Develop a data cleaning pipeline to detect and address outliers and inconsistencies in biomedical data.
  • Data: Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/)
  1. Analyzing Public Health Surveillance Data for Disease Outbreaks (Unsupervised Learning, Time Series Analysis)
  1. Identifying Risk Factors for Adverse Drug Reactions (Supervised Learning)
  1. Image Analysis of Medical Scans for Early Disease Detection (Deep Learning, Image Analysis)
  1. Comparing Treatment Effectiveness for Different Cancer Subtypes (Survival Analysis)
  • Goal: Analyze and compare the effectiveness of different treatment options for various cancer subtypes.
  • Data: SEER (Surveillance, Epidemiology, and End Results) Program Data (https://seer.cancer.gov/statfacts/)