Project ideas
These ideas are originally generated by GenAI and curated for appropriateness
Final Project Ideas:
Review Paper ideas:
- Critical analysis of a recent paper on biomarker discovery for a specific disease.
- Evaluate the strengths and weaknesses of a published study using machine learning for disease prediction.
- Compare and contrast two different approaches to causal inference in epidemiology.
- Analyze the ethical considerations of using AI in healthcare decision-making.
- Review a research paper exploring the explainability of deep learning models in bioinformatics.
Data-Based Project ideas:
- Analyze a large dataset to identify risk factors for a specific disease. (e.g., National Health and Nutrition Examination Survey (NHANES))
- Develop a machine learning model for predicting patient survival using clinical trial data.
- Compare gene expression patterns across different disease states. (e.g., The Cancer Genome Atlas (TCGA))
- Model the spread of an infectious disease using publicly available data.
- Investigate the relationship between environmental factors and disease prevalence.
Integrative Project (Combine Review & Data Analysis) ideas:
- Replicate and evaluate the methods used in a published paper on a different dataset.
- Develop a proposal for a data-driven intervention to improve public health in a specific area.
- Compare and contrast the strengths and weaknesses of different statistical methods for analyzing survival data.
- Investigate the role of specific genes in disease progression using both literature review and pathway analysis tools.
- Design a study to validate the effectiveness of a proposed biomarker for disease diagnosis.
Note: This is just a starting point. Students are encouraged to discuss their project ideas with the instructor for guidance and approval.
Project ideas (for data-based projects)
This is a non-exclusive and non-exhaustive set of ideas for projects. Your project proposals will have to have more specifics around the actual project you want to do. These lists are generated by various GenAI tools and then curated.
When limiting to publicly accessible data sources, students can use open datasets that are freely available for research purposes. Here are ten project topics with corresponding public data sources. Students should verify the terms of use for each data source to ensure compliance with any restrictions on use. Accessing some data sets (even those not included here) might require creating free accounts in order to comply with access requirements.
- Project Topic: Comparative Analysis of Obesity Rates Across Different Countries
- Data Source: WHO Global Health Observatory Data Repository (http://www.who.int/gho/database/en/)
- Project Topic: Predicting Heart Disease from Clinical and Socioeconomic Features
- Data Source: UCI Machine Learning Repository, specifically the Heart Disease dataset (https://archive.ics.uci.edu/ml/datasets/heart+disease)
- Project Topic: Analysis of Microbiome Data to Understand Gut Health
- Data Source: Human Microbiome Project Data (https://www.hmpdacc.org/)
- Project Topic: Genomic Data Analysis to Identify Variants Associated with Diabetes
- Data Source: 1000 Genomes Project (http://www.internationalgenome.org/)
- Project Topic: Machine Learning for Skin Cancer Classification from Image Data
- Data Source: ISIC Archive - Dermoscopic Images (https://www.isic-archive.com/)
- Project Topic: Evaluating Public Health Interventions on Seasonal Influenza Spread
- Data Source: FluNet provided by WHO (https://www.who.int/influenza/gisrs_laboratory/flunet/en/)
- Project Topic: Assessing the Impact of Lifestyle Choices on Longevity Using Survey Data
- Data Source: NHANES (National Health and Nutrition Examination Survey) (https://www.cdc.gov/nchs/nhanes/index.htm)
- Project Topic: Identifying Trends in Drug Prescription Practices Over Time
- Data Source: Medicare Part D Prescriber Public Use File (PUF) (https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/PartD2017)
- Project Topic: Network Analysis of Gene Expression Data in Breast Cancer
- Data Source: Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/)
- Project Topic: Developing a Predictive Model for Zika Virus Outbreaks
- Data Source: PAHO (Pan American Health Organization) Zika Cumulative Cases Data (https://www.paho.org/hq/index.php?option=com_topics&view=article&id=427&Itemid=41484&lang=en)
- Predicting Hospital Readmission Risk (Supervised Learning, Survival Analysis):
- Goal: Develop a model to predict the risk of hospital readmission for patients with chronic diseases.
- Data: MIMIC-III Clinical Database (https://physionet.org/content/mimiciii/)
- Analyzing Gene Expression Patterns in Cancer (Unsupervised Learning, Network Analysis):
- Goal: Identify clusters of genes with co-regulated expression patterns in different cancer types.
- Data: The Cancer Genome Atlas (TCGA) (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4322527/)
- Exploring Factors Affecting Patient Satisfaction with Telemedicine (Supervised Learning, Visualization):
- Goal: Identify factors influencing patient satisfaction with telemedicine services.
- Data: National Health Interview Survey (NHIS) (https://www.cdc.gov/nchs/nhis/index.htm)
- Predicting Drug-Target Interactions for Drug Discovery (Supervised Learning):
- Goal: Develop a model to predict potential drug-target interactions for novel drug development.
- Data: DrugBank (https://go.drugbank.com/) & STRING database (https://string-db.org/)
- Analyzing Trends in Global Healthcare Expenditures (Time Series Analysis, Visualization):
- Goal: Identify long-term trends and factors influencing global healthcare expenditures.
- Data: World Health Organization (WHO) Global Health Expenditure Database (https://apps.who.int/nha/database)
- Analyzing Social Media Data to Understand Public Perception of Vaccines (Natural Language Processing, Sentiment Analysis):
- Goal: Understand public sentiment and concerns surrounding vaccines using social media data.
- Data: Twitter API (https://developer.twitter.com/en)
- Identifying Risk Factors for Adverse Events in Clinical Trials (Supervised Learning, Survival Analysis):
- Goal: Develop a model to predict patients at higher risk of adverse events during clinical trials.
- Data: ClinicalTrials.gov database (https://clinicaltrials.gov/)
- Identifying Outliers and Data Quality Issues in Large Biomedical Datasets (Unsupervised Learning):
- Goal: Develop a data cleaning pipeline to detect and address outliers and inconsistencies in biomedical data.
- Data: Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/)
- Analyzing Public Health Surveillance Data for Disease Outbreaks (Unsupervised Learning, Time Series Analysis)
- Goal: Develop a system for early detection of disease outbreaks using public health surveillance data.
- Data: World Health Organization (WHO) Disease Outbreak Database (https://www.who.int/emergencies/disease-outbreak-news)
- Identifying Risk Factors for Adverse Drug Reactions (Supervised Learning)
- Goal: Develop a model to predict patients at higher risk of adverse reactions to specific medications.
- Data: The FDA Adverse Event Reporting System (FAERS) (https://www.fda.gov/drugs/questions-and-answers-fdas-adverse-event-reporting-system-faers/fda-adverse-event-reporting-system-faers-public-dashboard)
- Image Analysis of Medical Scans for Early Disease Detection (Deep Learning, Image Analysis)
- Goal: Develop a deep learning model to detect early signs of disease in medical images like chest X-rays.
- Data: ChestX-ray8 (https://www.kaggle.com/datasets/homayoonkhadivi/chest-xray-worldwide-datasets)
- Comparing Treatment Effectiveness for Different Cancer Subtypes (Survival Analysis)
- Goal: Analyze and compare the effectiveness of different treatment options for various cancer subtypes.
- Data: SEER (Surveillance, Epidemiology, and End Results) Program Data (https://seer.cancer.gov/statfacts/)