9  Working with cohorts

9.1 Cohort intersections

When conducting research, it is often necessary to study patients who meet multiple clinical criteria simultaneously. For example, we may be interested in analysing outcomes among patients who have both diabetes and hypertension. Using the OMOP Common Data Model, this typically involves first creating two separate cohorts: one for patients with diabetes and another for those with hypertension. To identify patients who meet both conditions, the next step is to compute the intersection of these cohorts. This ensures that the final study population includes only individuals who satisfy all specified criteria. Hence, finding cohort intersections is a common and essential task when working with the OMOP Common Data Model, enabling researchers to define precise target populations that align with their research objectives.

Depending on the research question, the definition of a cohort intersection may vary. For instance, you might require patients to have a diagnosis of hypertension before developing diabetes, or that both diagnoses occur within a specific time window. These additional temporal or clinical criteria can make cohort intersection more complex. The PatientProfiles R package addresses these challenges by providing a suite of flexible functions to support the calculation of cohort intersections under various scenarios.

9.2 Intersection between two cohorts

Suppose we are interested in studying patients with gastrointestinal (GI) bleeding who have also been exposed to acetaminophen. First, we would create two separate cohorts: one for patients with GI bleeding and another for patients with exposure to acetaminophen. Below is an example of the code used to create these cohorts within the Eunomia synthetic database.

library(CDMConnector)
library(dplyr)
library(PatientProfiles)
library(here)

# For this example we will use GiBleed data set
downloadEunomiaData(datasetName = "GiBleed")
db <- DBI::dbConnect(duckdb::duckdb(), eunomiaDir())

cdm <- cdmFromCon(db, cdmSchema = "main", writeSchema = "main")

cdm <- cdm |>
  generateConceptCohortSet(
    conceptSet = list("gi_bleed" = 192671),
    limit = "all",
    end = 30,
    name = "gi_bleed",
    overwrite = TRUE
  ) |>
  generateConceptCohortSet(
    conceptSet = list(
      "acetaminophen" = c(
        1125315,
        1127078,
        1127433,
        40229134,
        40231925,
        40162522,
        19133768
      )
    ),
    limit = "all",
    # end = "event_end_date",
    name = "acetaminophen",
    overwrite = TRUE
  )

The PatientProfiles package contains functions to obtain the intersection flag, count, date, or number of days between cohorts. To get a binary indicator showing the presence of an intersection between the cohorts within a given time window, we can use addCohortIntersectFlag.

9.2.1 Flag

x <- cdm$gi_bleed |>
  addCohortIntersectFlag(targetCohortTable = "acetaminophen",
                         window = list(c(-Inf, -1), c(0,0), c(1, Inf)))

x |>
  summarise(acetaminophen_prior = sum(acetaminophen_minf_to_m1),
            acetaminophen_index = sum(acetaminophen_0_to_0),
            acetaminophen_post = sum(acetaminophen_1_to_inf)) |>
  collect()
# A tibble: 1 × 3
  acetaminophen_prior acetaminophen_index acetaminophen_post
                <dbl>               <dbl>              <dbl>
1                 467                 467                476

To get the count of occurrences of intersection between two cohorts, we can use addCohortIntersectCount

9.2.2 Count

x <- cdm$gi_bleed |>
  addCohortIntersectCount(targetCohortTable = "acetaminophen",
                         window = list(c(-Inf, -1), c(0,0), c(1, Inf)))

x |>
  summarise(acetaminophen_prior = sum(acetaminophen_minf_to_m1),
            acetaminophen_index = sum(acetaminophen_0_to_0),
            acetaminophen_post = sum(acetaminophen_1_to_inf)) |>
  collect()
# A tibble: 1 × 3
  acetaminophen_prior acetaminophen_index acetaminophen_post
                <dbl>               <dbl>              <dbl>
1                 467                 467                476

9.2.3 Date and times

To get the date of the intersection with a cohort within a given time window, we can use addCohortIntersectDate. To get the number of days between the index date and intersection, we can use addCohortIntersectDays.

Both functions allow the order argument to specify which value to return:

  • first returns the first date/days that satisfy the window

  • last returns the last date/days that satisfy the window

x <- cdm$gi_bleed |>
  addCohortIntersectDate(targetCohortTable = "acetaminophen",
                         window = list(c(-Inf, -1), c(1, Inf)),
                         order = "first")

x |>
  summarise(acetaminophen_prior = median(acetaminophen_minf_to_m1),
            acetaminophen_post = median(acetaminophen_1_to_inf)) |>
  collect()
# A tibble: 1 × 2
  acetaminophen_prior acetaminophen_post 
  <dttm>              <dttm>             
1 1967-08-07 00:00:00 1981-09-14 00:00:00
x <- cdm$gi_bleed |>
  addCohortIntersectDays(targetCohortTable = "acetaminophen",
                         window = list(c(-Inf, -1), c(1, Inf)),
                         order = "first")

x |>
  summarise(acetaminophen_prior = median(acetaminophen_minf_to_m1),
            acetaminophen_post = median(acetaminophen_1_to_inf)) |>
  collect()
# A tibble: 1 × 2
  acetaminophen_prior acetaminophen_post
                <dbl>              <dbl>
1              -12329               3580

9.3 Intersection between a cohort and tables with patient data

Sometimes we might want to get the intersection between a cohort and another OMOP table. PatientProfiles also includes several addTableIntersect* functions to obtain intersection flags, counts, days, or dates between a cohort and clinical tables.

For example, if we want to get the number of general practitioner (GP) visits for individuals in the cohort, we can use the visit_occurrence table:

x <- cdm$gi_bleed |>
  addTableIntersectCount(tableName = "visit_occurrence",
                         window = list(c(-Inf, -1)))

x |>
  summarise(visit_occurrence_prior = median(visit_occurrence_minf_to_m1)) |>
  collect()
# A tibble: 1 × 1
  visit_occurrence_prior
                   <dbl>
1                      0

10 Further reading

Full details on the intersection functions in PatientProfiles can be found on the package website:

https://darwin-eu.github.io/PatientProfiles/