library(CDMConnector)
library(dplyr)
library(PatientProfiles)
library(here)
# For this example we will use GiBleed data set
downloadEunomiaData(datasetName = "GiBleed")
<- DBI::dbConnect(duckdb::duckdb(), eunomiaDir())
db
<- cdmFromCon(db, cdmSchema = "main", writeSchema = "main")
cdm
<- cdm |>
cdm generateConceptCohortSet(
conceptSet = list("gi_bleed" = 192671),
limit = "all",
end = 30,
name = "gi_bleed",
overwrite = TRUE
|>
) generateConceptCohortSet(
conceptSet = list(
"acetaminophen" = c(
1125315,
1127078,
1127433,
40229134,
40231925,
40162522,
19133768
)
),limit = "all",
# end = "event_end_date",
name = "acetaminophen",
overwrite = TRUE
)
9 Working with cohorts
9.1 Cohort intersections
When conducting research, it is often necessary to study patients who meet multiple clinical criteria simultaneously. For example, we may be interested in analysing outcomes among patients who have both diabetes and hypertension. Using the OMOP Common Data Model, this typically involves first creating two separate cohorts: one for patients with diabetes and another for those with hypertension. To identify patients who meet both conditions, the next step is to compute the intersection of these cohorts. This ensures that the final study population includes only individuals who satisfy all specified criteria. Hence, finding cohort intersections is a common and essential task when working with the OMOP Common Data Model, enabling researchers to define precise target populations that align with their research objectives.
Depending on the research question, the definition of a cohort intersection may vary. For instance, you might require patients to have a diagnosis of hypertension before developing diabetes, or that both diagnoses occur within a specific time window. These additional temporal or clinical criteria can make cohort intersection more complex. The PatientProfiles
R package addresses these challenges by providing a suite of flexible functions to support the calculation of cohort intersections under various scenarios.
9.2 Intersection between two cohorts
Suppose we are interested in studying patients with gastrointestinal (GI) bleeding who have also been exposed to acetaminophen. First, we would create two separate cohorts: one for patients with GI bleeding and another for patients with exposure to acetaminophen. Below is an example of the code used to create these cohorts within the Eunomia synthetic database.
The PatientProfiles
package contains functions to obtain the intersection flag, count, date, or number of days between cohorts. To get a binary indicator showing the presence of an intersection between the cohorts within a given time window, we can use addCohortIntersectFlag
.
9.2.1 Flag
<- cdm$gi_bleed |>
x addCohortIntersectFlag(targetCohortTable = "acetaminophen",
window = list(c(-Inf, -1), c(0,0), c(1, Inf)))
|>
x summarise(acetaminophen_prior = sum(acetaminophen_minf_to_m1),
acetaminophen_index = sum(acetaminophen_0_to_0),
acetaminophen_post = sum(acetaminophen_1_to_inf)) |>
collect()
# A tibble: 1 × 3
acetaminophen_prior acetaminophen_index acetaminophen_post
<dbl> <dbl> <dbl>
1 467 467 476
To get the count of occurrences of intersection between two cohorts, we can use addCohortIntersectCount
9.2.2 Count
<- cdm$gi_bleed |>
x addCohortIntersectCount(targetCohortTable = "acetaminophen",
window = list(c(-Inf, -1), c(0,0), c(1, Inf)))
|>
x summarise(acetaminophen_prior = sum(acetaminophen_minf_to_m1),
acetaminophen_index = sum(acetaminophen_0_to_0),
acetaminophen_post = sum(acetaminophen_1_to_inf)) |>
collect()
# A tibble: 1 × 3
acetaminophen_prior acetaminophen_index acetaminophen_post
<dbl> <dbl> <dbl>
1 467 467 476
9.2.3 Date and times
To get the date of the intersection with a cohort within a given time window, we can use addCohortIntersectDate
. To get the number of days between the index date and intersection, we can use addCohortIntersectDays
.
Both functions allow the order
argument to specify which value to return:
first
returns the first date/days that satisfy the windowlast
returns the last date/days that satisfy the window
<- cdm$gi_bleed |>
x addCohortIntersectDate(targetCohortTable = "acetaminophen",
window = list(c(-Inf, -1), c(1, Inf)),
order = "first")
|>
x summarise(acetaminophen_prior = median(acetaminophen_minf_to_m1),
acetaminophen_post = median(acetaminophen_1_to_inf)) |>
collect()
# A tibble: 1 × 2
acetaminophen_prior acetaminophen_post
<dttm> <dttm>
1 1967-08-07 00:00:00 1981-09-14 00:00:00
<- cdm$gi_bleed |>
x addCohortIntersectDays(targetCohortTable = "acetaminophen",
window = list(c(-Inf, -1), c(1, Inf)),
order = "first")
|>
x summarise(acetaminophen_prior = median(acetaminophen_minf_to_m1),
acetaminophen_post = median(acetaminophen_1_to_inf)) |>
collect()
# A tibble: 1 × 2
acetaminophen_prior acetaminophen_post
<dbl> <dbl>
1 -12329 3580
9.3 Intersection between a cohort and tables with patient data
Sometimes we might want to get the intersection between a cohort and another OMOP table. PatientProfiles
also includes several addTableIntersect*
functions to obtain intersection flags, counts, days, or dates between a cohort and clinical tables.
For example, if we want to get the number of general practitioner (GP) visits for individuals in the cohort, we can use the visit_occurrence
table:
<- cdm$gi_bleed |>
x addTableIntersectCount(tableName = "visit_occurrence",
window = list(c(-Inf, -1)))
|>
x summarise(visit_occurrence_prior = median(visit_occurrence_minf_to_m1)) |>
collect()
# A tibble: 1 × 1
visit_occurrence_prior
<dbl>
1 0
10 Further reading
Full details on the intersection functions in PatientProfiles
can be found on the package website:
https://darwin-eu.github.io/PatientProfiles/