Tidy R programming with the OMOP Common Data Model

Authors

Edward Burn

Adam Black

Berta Raventós

Yuchen Guo

Mike Du

Kim López-Güell

Núria Mercadé-Besora

Martí Català

Published

October 5, 2025

Preface

Is this book for me?

We’ve written this book for anyone interested in a working with Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) instances using a tidyverse style approach. That is, human centered, consistent, composable, and inclusive (see Tidy design principles for more details on these principles).

New to R? We recommend you to take a look to R for data science before reading this book. We assume that you have R installed and together with an adequate Integrated Development Environment (IDE) such as R Studio or positron. See this tutorial if you need guidance on how to get started. The book uses multiple packages, that you will need to install see the list in the R packages section.

New to databases? We recommend you take a look at some web tutorials on SQL, such as SQLBolt or SQLZoo to have a basic understanding of how databases work.

New to the OMOP CDM? We’d recommend you pare this book with The Book of OHDSI.

How is the book organised?

The book is divided into two parts. The first half of the book is focused on the general principles for working with databases from R. In these chapters you will see how you can use familiar tidyverse-style code to build up analytic pipelines that start with data held in a database and end with your analytic results. The second half of the book is focused on working with data in the OMOP CDM format, a widely used data format for health care data. In these chapters you will see how to work with this data format using the general principles from the first half of the book along with a set of R packages that have been built for the OMOP CDM.

Citation

Please if you found this book useful help us citing it:

Burn E, Black A, Raventós B, Guo Y, Du M, López-Güell K, Mercadé-Besora N, 
Català M. Tidy R programming with the OMOP Common Data Model. GitHub; 2025.
https://github.com/oxford-pharmacoepi/Tidy-R-programming-with-OMOP

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Code

The source code for the book can be found at this GitHub repository, please star it if you found it useful.

R Packages

This book is rendered automatically though GitHub Actions using the following version of packages:

Finding R package dependencies ... Done!
Package Version Link
CDMConnector 2.2.0 🔗
CodelistGenerator 3.5.0 🔗
CohortCharacteristics 1.0.1 🔗
CohortConstructor 0.5.0 🔗
DBI 1.2.3 🔗
Lahman 13.0-0 🔗
PatientProfiles 1.4.3 🔗
bit64 4.6.0-1 🔗
cli 3.6.5 🔗
clock 0.7.3 🔗
dbplyr 2.5.1 🔗
dm 1.0.12 🔗
dplyr 1.1.4 🔗
duckdb 1.4.0 🔗
ggplot2 4.0.0 🔗
nycflights13 1.0.2 🔗
omock 0.5.0 🔗
omopgenerics 1.3.1 🔗
palmerpenguins 0.1.1 🔗
purrr 1.1.0 🔗
sloop 1.0.1 🔗
stringr 1.5.2 🔗
tidyr 1.3.1 🔗

Note we only included the packages called explicitly in the book.