CohortCharacteristics

A R package to Characterise cohorts

CohortCharacteristics

Website

Cohort characteristics is on cran:

install.packages("CohortCharacteristics")

You can also install the development version from our github repo:

pak::pkg_install("darwin-eu/CohortCharacteristics")

The documentation and vignettes of the packages can be found in our page: https://darwin-eu.github.io/CohortCharacteristics/

Let’s get started

For this presentation we will use the GiBleed dataset from omock.

library(omock)
cdm <- mockCdmFromDataset(datasetName = "GiBleed", source = "duckdb")
cdm

Section title

Person table

Let’s see the person table:

cdm$person
# Source:   table<person> [?? x 18]
# Database: DuckDB 1.4.0 [unknown@Linux 6.11.0-1018-azure:R 4.4.1//tmp/RtmpxnP7A5/file69f72fc425d4.duckdb]
   person_id gender_concept_id year_of_birth month_of_birth day_of_birth birth_datetime      race_concept_id
       <int>             <int>         <int>          <int>        <int> <dttm>                        <int>
 1         6              8532          1963             12           31 1963-12-31 00:00:00            8516
 2       123              8507          1950              4           12 1950-04-12 00:00:00            8527
 3       129              8507          1974             10            7 1974-10-07 00:00:00            8527
 4        16              8532          1971             10           13 1971-10-13 00:00:00            8527
 5        65              8532          1967              3           31 1967-03-31 00:00:00            8516
 6        74              8532          1972              1            5 1972-01-05 00:00:00            8527
 7        42              8532          1909             11            2 1909-11-02 00:00:00            8527
 8       187              8507          1945              7           23 1945-07-23 00:00:00            8527
 9        18              8532          1965             11           17 1965-11-17 00:00:00            8527
10       111              8532          1975              5            2 1975-05-02 00:00:00            8527
# ℹ more rows
# ℹ 11 more variables: ethnicity_concept_id <int>, location_id <int>, provider_id <int>, care_site_id <int>,
#   person_source_value <chr>, gender_source_value <chr>, gender_source_concept_id <int>, race_source_value <chr>,
#   race_source_concept_id <int>, ethnicity_source_value <chr>, ethnicity_source_concept_id <int>

Snapshot

Estimate
Database name
GiBleed
General
Snapshot date 2025-09-22
Person count 2,694
Vocabulary version v5.0 18-JAN-19
Observation period
N 5,343
Start date 1908-09-22
End date 2019-07-03
Cdm
Source name Synthea synthetic health database
Version v5.3.1
Holder name OHDSI Community
Release date 2019-05-25
Description SyntheaTM is a Synthetic Patient Population Simulator. The goal is to output synthetic, realistic (but not real), patient data and associated health records in a variety of formats.
Documentation reference https://synthetichealth.github.io/synthea/
Source type duckdb

CohortCharacteristics

👉 Packages website
👉 CRAN link
👉 Manual

📧 marti.catalasabate@ndorms.ox.ac.uk