CohortSurvival

Conduct survival analyses

2025-06-26

Introduction

  • CohortSurvival is a package designed to support descriptive survival studies in R, using data mapped to the OMOP Common Data Model.
  • The code is publicly available in the DARWIN EU’s GitHub repository CohortSurvival.
  • CohortSurvival v1.0.1 is available in CRAN.
  • Vignettes with further information can be found in the package website.

How do you use CohortSurvival?

1) Create a cdm object with the cohorts of interest

You only need to specify the exposure and the outcome cohort names.

2) Run the survival estimation

Add any additional strata or input parameters necessary for your study.

3) Plot and tabulate the results

Use the in-built visualisation functions from the package to plot the survival estimates and display the survival summary in a neat table.

CohortSurvival’s main functionality

 

Estimate single event survival Specify exposure and outcome cohorts.

 

Estimate competing risk survival Specify exposure, outcome and competing outcome cohorts.

 

Do further survival analyses Add survival information to the cohort of interest to run survival models using other well-known packages (i.e. survival).

Estimate survival

Libraries we are going to use

Let’s create the database

This should have been done in the first day, but if someone does not have it created this is a remminder of the code:

datasetName <- "GiBleed"
dbdir <- here(paste0(datasetName, ".duckdb"))
con <- dbConnect(drv = duckdb(dbdir = dbdir))

cdm <- mockCdmFromDataset(datasetName = datasetName)
insertCdmTo(cdm = cdm, to = dbSource(con = con, writeSchema = "main"))
dbDisconnect(conn = con)

Create cdm and get cohorts

datasetName <- "GiBleed"
dbdir <- here(paste0(datasetName, ".duckdb"))
con <- dbConnect(drv = duckdb(dbdir = dbdir))

cdm <- cdmFromCon(
  con = con, 
  cdmSchema = "main",
  writeSchema = "main",
  writePrefix = "my_study_", 
  cdmName = datasetName
)

Use estimateSingleSurvival() for treatment adherence

Single event survival estimation function, with all its input parameters.

estimateSingleEventSurvival(
  cdm,
  targetCohortTable,
  outcomeCohortTable,
  outcomeDateVariable = "cohort_start_date",
  outcomeWashout = Inf,
  censorOnCohortExit = FALSE,
  censorOnDate = NULL,
  followUpDays = Inf,
  strata = NULL,
  eventGap = 30,
  estimateGap = 1,
  restrictedMeanFollowUp = NULL,
  minimumSurvivalDays = 1
)

Use estimateSingleSurvival() for treatment adherence

Plotting function, with all its input parameters.

plotSurvival(
  result,
  ribbon = TRUE,
  facet = NULL,
  colour = NULL,
  cumulativeFailure = FALE,
  riskTable = FALSE,
  riskTable(30)
)

Use estimateSingleSurvival() for treatment adherence

Main tabulating function, with all its input parameters.

tableSurvival(
  x = x,
  times = NULL,
  timeScale = "days",
  header = c("estimate"),
  type = "gt",
  groupColumn = NULL,
  .options = list()
)

Use estimateSingleSurvival() for treatment adherence

In our case, we will study discontinuation of ibuprofen. Therefore, we will use the same cohort as both target and outcome. Additionally, we will need to change outcomeDateVariable to cohort_end_date.

codelist <- getDrugIngredientCodes(cdm = cdm, name = "ibuprofen")
cdm$ibuprofen <- conceptCohort(
  cdm = cdm, 
  conceptSet = codelist,
  name = "ibuprofen"
) |>
  collapseCohorts(gap = 7)

survivalResult <- estimateSingleEventSurvival(
  cdm = cdm, 
  targetCohortTable = "ibuprofen",
  outcomeCohortTable = "ibuprofen",
  outcomeDateVariable = "cohort_end_date"
)

Use estimateSingleSurvival() for treatment adherence

glimpse(survivalResult, width = 100)
Rows: 345
Columns: 13
$ result_id        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ cdm_name         <chr> "GiBleed", "GiBleed", "GiBleed", "GiBleed", "GiBleed", "GiBleed", "GiBlee…
$ group_name       <chr> "target_cohort", "target_cohort", "target_cohort", "target_cohort", "targ…
$ group_level      <chr> "5640_ibuprofen", "5640_ibuprofen", "5640_ibuprofen", "5640_ibuprofen", "…
$ strata_name      <chr> "overall", "overall", "overall", "overall", "overall", "overall", "overal…
$ strata_level     <chr> "overall", "overall", "overall", "overall", "overall", "overall", "overal…
$ variable_name    <chr> "outcome", "outcome", "outcome", "outcome", "outcome", "outcome", "outcom…
$ variable_level   <chr> "5640_ibuprofen", "5640_ibuprofen", "5640_ibuprofen", "5640_ibuprofen", "…
$ estimate_name    <chr> "estimate", "estimate_95CI_lower", "estimate_95CI_upper", "estimate", "es…
$ estimate_type    <chr> "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeri…
$ estimate_value   <chr> "1", "1", "1", "0.9979", "0.9956", "1", "0.9979", "0.9956", "1", "0.9979"…
$ additional_name  <chr> "time", "time", "time", "time", "time", "time", "time", "time", "time", "…
$ additional_level <chr> "0", "0", "0", "1", "1", "1", "2", "2", "2", "3", "3", "3", "4", "4", "4"…

Use estimateSingleSurvival() for treatment adherence

plotSurvival(survivalResult)
tableSurvival(x = survivalResult, times = c(7, 30, 75))
CDM name Target cohort Outcome name
Estimate name
Number records Number events Median survival (95% CI) Restricted mean survival (95% CI) 7 days survival estimate 30 days survival estimate 75 days survival estimate
GiBleed 5640_ibuprofen 5640_ibuprofen 1,448 1,448 21.00 (21.00, 21.00) 29.00 (28.00, 31.00) 99.79 (99.56, 100.00) 24.65 (22.53, 26.98) 7.18 (5.97, 8.64)

Use estimateSingleSurvival() for treatment adherence

You can also visualise the risk table, with information on number of people at risk, number of events and number of people censored by event timepoint (defined by eventGap).

riskTable(survivalResult)
CDM name Target cohort Outcome name Time Event gap
Estimate name
Number at risk Number events Number censored
GiBleed 5640_ibuprofen 5640_ibuprofen 0 30 1,448 0 0
30 30 456 1,091 0
60 30 222 236 0
90 30 104 121 0

Your turn!

Exercise 1 - Estimate treatment adherence of aspirin users

Create a cohort of aspirin use and estimate treatment adherence.

  • How many people are at risk at cohort entry, and how many are left at risk after a month?
  • Plot the drug discontinuation curve and compare it to the ibuprofen one. If you can, plot both Kaplan-Meier curves in the same survival plot.

Hint: use bind() and the colour option in plotSurvival() for the two outcomes.

Exercise 1 - Estimate treatment adherence of aspirin users

Click to see solution
codelist <- getDrugIngredientCodes(cdm = cdm, name = "aspirin")
cdm$aspirin <- conceptCohort(
  cdm = cdm, 
  conceptSet = codelist,
  name = "aspirin"
) |>
  collapseCohorts(gap = 7)

survivalResultAspirin <- estimateSingleEventSurvival(
  cdm = cdm, 
  targetCohortTable = "aspirin", 
  outcomeCohortTable = "aspirin",
  outcomeDateVariable = "cohort_end_date"
)

survivalResultAll <- bind(survivalResult, survivalResultAspirin)

Exercise 1 - Estimate treatment adherence of aspirin users

Click to see solution
riskTable(survivalResultAll)
CDM name Target cohort Outcome name Time Event gap
Estimate name
Number at risk Number events Number censored
GiBleed 5640_ibuprofen 5640_ibuprofen 0 30 1,448 0 0
30 30 456 1,091 0
60 30 222 236 0
90 30 104 121 0
1191_aspirin 1191_aspirin 0 30 1,927 0 0
30 30 349 1,651 0
60 30 180 176 0
90 30 87 100 0

Exercise 1 - Estimate treatment adherence of aspirin users

Click to see solution
plotSurvival(result = survivalResultAll, colour = "outcome")

Stratification

Adding strata - Example

  • We can stratify our survival study by any variables available in the target cohort provided. We can use PatientProfiles to add, for instance, demographic information.
cdm$ibuprofen_strata <- cdm$ibuprofen |> 
  addAge(
    ageGroup = list(c(0,40),c(41,70),c(71,150)),
    name = "ibuprofen_strata"
  )

cdm$ibuprofen_strata |>
  select(-"age") |>
  head()
# Source:   SQL [?? x 5]
# Database: DuckDB v1.3.1 [unknown@Linux 6.11.0-1015-azure:R 4.5.1//home/runner/work/RealWorldEvidenceSummerSchool2025/RealWorldEvidenceSummerSchool2025/GiBleed.duckdb]
  cohort_definition_id subject_id cohort_start_date cohort_end_date age_group
                 <int>      <int> <date>            <date>          <chr>    
1                    1          6 1976-05-01        1976-06-30      0 to 40  
2                    1         16 2004-01-21        2004-03-21      0 to 40  
3                    1        114 1988-08-14        1988-08-28      0 to 40  
4                    1         40 2018-06-14        2018-07-14      41 to 70 
5                    1         72 2008-06-19        2008-07-17      41 to 70 
6                    1         53 1974-05-05        1974-06-04      0 to 40  

Adding strata - Example

survivalResultStrata <- estimateSingleEventSurvival(
  cdm = cdm,
  targetCohortTable = "ibuprofen_strata", 
  outcomeCohortTable = "ibuprofen_strata",
  outcomeDateVariable = "cohort_end_date",
  strata = list("age_group")
)

Adding strata - Example

plotSurvival(result = survivalResultStrata, colour = "age_group", riskTable = TRUE)

Your turn!

Exercise 2 - Stratification

Use the same aspirin cohort as before.

  • Add both sex and age_group information.

  • Estimate treatment adherence and plot all Kaplan-Meier curves in one plot.

Exercise 2 - Stratification

Click to see solution
cdm$aspirin_strata <- cdm$aspirin |> 
  addDemographics(
    ageGroup = list("kids" = c(0,18), "adults" = c(19,150)),
    name = "aspirin_strata"
  )

survivalResultStrata <- estimateSingleEventSurvival(
  cdm = cdm,
  targetCohortTable = "aspirin_strata", 
  outcomeCohortTable = "aspirin_strata",
  outcomeDateVariable = "cohort_end_date",
  strata = list("age_group", "sex", c("age_group", "sex"))
)

Exercise 2 - Stratification

Click to see solution
plotSurvival(result = survivalResultStrata, colour = "age_group", facet = "sex")

Additional input choices

Let’s play with the parameters of the survival estimation function

Reminder of all possible ways we can tweak our analysis:

# don't run
estimateSingleEventSurvival(
  cdm = cdm,
  targetCohortTable = "ibuprofen",
  outcomeCohortTable = "ibuprofen",
  outcomeDateVariable = "cohort_end_date",
  outcomeWashout = Inf,
  censorOnCohortExit = FALSE,
  censorOnDate = NULL,
  followUpDays = Inf,
  strata = NULL,
  eventGap = 30,
  estimateGap = 1,
  restrictedMeanFollowUp = NULL,
  minimumSurvivalDays = 1
)

Changing inputs - Example

Let’s change the event gap to a weekly aggregation and display the risk table under the plot.

survivalResultEventGap <- estimateSingleEventSurvival(
  cdm = cdm, 
  targetCohortTable = "ibuprofen", 
  outcomeCohortTable = "ibuprofen",
  outcomeDateVariable = "cohort_end_date",
  eventGap = 7
)

Changing inputs - Example

plotSurvival(result = survivalResultEventGap, riskTable = TRUE, riskInterval = 7)

Your turn!

Exercise 3 - Changing inputs

Keep working with the cohort of individuals starting aspirin prescription.

  • Now estimate survival with a 5-year washout period, instead of considering first events.

Exercise 3 - Changing inputs

Click to see solution
survivalResultAspirinTuned <- estimateSingleEventSurvival(
  cdm = cdm,
  targetCohortTable = "aspirin",
  outcomeCohortTable = "aspirin",
  outcomeDateVariable = "cohort_end_date",
  outcomeWashout = 1825
)

tableSurvival(x = bind(survivalResultAspirin, survivalResultAspirinTuned))
CDM name Target cohort Outcome name Outcome washout
Estimate name
Number records Number events Median survival (95% CI) Restricted mean survival (95% CI)
GiBleed 1191_aspirin 1191_aspirin Inf 1,927 1,927 14.00 (14.00, 14.00) 23.00 (22.00, 23.00)
1825 3,020 3,020 14.00 (14.00, 14.00) 25.00 (25.00, 26.00)

Exercise 3 - Changing inputs

Click to see solution
plotSurvival(result = bind(survivalResultAspirin, survivalResultAspirinTuned), 
             colour = "outcome_washout")

Exercise 4 - Survival with different exposure and outcome

What is the time to first myocardial infraction after starting simvastatin or aspirin treatment, and how does this differ between both prescription drugs?

Exercise 4 - MI after start of simvastatin or aspirin

Click to see solution
conceptSetMI <- getCandidateCodes(cdm = cdm, keywords = "myocardial")
mi <- list("mi" = conceptSetMI$concept_id)
cdm$mi <- conceptCohort(cdm, conceptSet = mi, name = "mi")

codelist <- getDrugIngredientCodes(cdm = cdm, name = c("simvastatin", "aspirin"))
cdm$statins <- conceptCohort(
  cdm = cdm, 
  conceptSet = codelist,
  name = "statins"
) |>
  collapseCohorts(gap = 7)

survivalStatin <- estimateSingleEventSurvival(
  cdm = cdm, 
  targetCohortTable = "statins", 
  outcomeCohortTable = "mi"
)

Exercise 4 - MI after start of simvastatin or aspirin

Click to see solution
plotSurvival(result = survivalStatin, colour = "target_cohort")

Thank you!

Questions?

CohortSurvival