<- readRDS('lab-02/pop.rds')
pop dim(pop)
[1] 1000000 5
Lab 2
Github classroom link: https://classroom.github.com/a/2KPYXw8h
In this lab we will look at different experimental designs, and how they work with inference and decisions. We’ll also start looking at how experimental design can help with dealing with confounders, a topic we will continue into next week.
We will start with the same population that we provided in the first homework
Let’s first create a cohort study from this population
Let’s get an estimate of the odds ratio and it’s confidence interval from this cohort.
library(rsample)
OR <- cohort |> tabyl(gene, sick) |>
select(-1) |>
odds_ratio()
b <- bootstraps(cohort, times = 1000) |>
mutate(OR = map_dbl(splits, \(sp) analysis(sp) |>
tabyl(gene, sick) |>
select(-1) |>
odds_ratio()))
quantile(b$OR, c(0.025, 0.975))
2.5% 97.5%
1.899254 2.557906
We can estimate the odds ratio between gene
and sick
using logistic regression
model1 <- glm(sick ~ gene, data = cohort, family = binomial)
tidy(model1, exponentiate = TRUE, conf.int=TRUE)
# A tibble: 2 × 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.0585 0.0199 -142. 0 0.0562 0.0608
2 gene 2.20 0.0774 10.2 1.70e-24 1.89 2.56
What happens if we adjust by age?
# A tibble: 8 × 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.0275 0.0731 -49.2 0 0.0237 0.0316
2 gene 2.23 0.0782 10.2 1.35e-24 1.91 2.59
3 age_grp(25,35] 1.36 0.0912 3.34 8.28e- 4 1.14 1.62
4 age_grp(35,45] 1.57 0.0898 5.05 4.31e- 7 1.32 1.88
5 age_grp(45,55] 2.21 0.0869 9.14 6.37e-20 1.87 2.63
6 age_grp(55,65] 2.68 0.0846 11.6 2.30e-31 2.27 3.17
7 age_grp(65,75] 3.70 0.0840 15.6 1.01e-54 3.15 4.37
8 age_grp(75,85] 4.74 0.0992 15.7 1.87e-55 3.90 5.76
What happens if we adjust by gender?
# A tibble: 3 × 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.0703 0.0261 -102. 0 0.0668 0.0740
2 gene 1.92 0.0784 8.34 7.42e-17 1.65 2.24
3 genderM 0.673 0.0399 -9.94 2.73e-23 0.622 0.727
Let’s do a stratified analysis, by gender.
cohort |>
nest(data = -gender) |>
mutate(OR = map_dbl(data, \(d) d |> tabyl(gene, sick) |>select(-1) |> odds_ratio()))
# A tibble: 2 × 3
gender data OR
<chr> <list> <dbl>
1 M <tibble [24,767 × 4]> 1.77
2 F <tibble [25,233 × 4]> 1.94
gender n percent
F 142 0.568
M 108 0.432
controls1 <- pop |> filter(sick == 0) |>sample_n(250)
controls2_male <- pop |>filter(sick==0, gender == 'M') |>sample_n(sum(cases$gender=="M"))
controls2_female <- pop |>filter(sick == 0, gender == "F") |>
sample_n(sum(cases$gender == "F"))
controls2 <- rbind(controls2_male, controls2_female)
cc_unmatched <- rbind(cases, controls1)
cc_matched <- rbind(cases, controls2)
Now compute the odds ratio for both the unmatched and matched case control studies, stratified by gender.