8  Error messages and input validation

Giving users informative error messages is one of the most important things you can do to make a package pleasant to use. A cryptic error from deep inside a dplyr pipeline — or worse, a silent wrong result — is much harder to debug than a clear message at the top of the function that says exactly what was wrong with the input.

The rule of thumb is: validate at the boundary, trust inside. Check all inputs at the very start of each exported function, before any work is done. Once those checks pass, do not re-check the same things in internal helpers.

That said, do not overdo it. Avoid checking things that are guaranteed by earlier checks, and avoid checks that cost more than the function itself. In general, check that each argument has the expected type, that it has the expected length, and that any logical constraints between arguments are satisfied.

8.1 The two kinds of validation helpers in omopgenerics

omopgenerics provides two families of validation helpers that together cover almost everything you need.

validate* functions are for arguments that appear consistently across many packages — cdm, cohort, cohortId, conceptSet, window, ageGroup, strata, name, nameStyle, and result. These functions do more than just check: they also coerce and normalise the input to a canonical form. You must assign their return value back to the variable, because the cleaned-up version is what the rest of your function should use.

assert* functions are for simple type and constraint checks. They do not transform the input — they either pass silently or throw an error. Use these for arguments that do not have a dedicated validate* function.

8.2 validate* functions

The following validate* functions are available in omopgenerics. Import them with @importFrom omopgenerics validateCdmArgument (and so on) rather than calling them with omopgenerics::.

Function Validates and normalises
validateCdmArgument(cdm) CDM reference: checks class, optionally checks required tables
validateCohortArgument(cohort) Cohort table: checks class, required columns, attributes
validateCohortIdArgument(cohortId, cohort) Cohort IDs: resolves NULL to all IDs, checks IDs exist in cohort
validateConceptSetArgument(conceptSet) Concept set: accepts codelist, codelist_with_details, or concept_set_expression
validateWindowArgument(window) Window: accepts vector or named list, always returns named list
validateAgeGroupArgument(ageGroup) Age groups: accepts several input forms, always returns named list of named intervals
validateStrataArgument(strata, cohort) Strata: checks that all columns named in strata exist in the cohort
validateNameArgument(name, cdm) Table name: checks it is a valid identifier, optionally checks it doesn’t already exist
validateNameStyle(nameStyle, ...) Name style template: checks it is a valid glue template
validateResultArgument(result) Summarised result: checks class and required columns

8.2.1 Normalisation is the point

The most important property of validate* functions is that they accept several reasonable input forms and always return a single canonical form. This means the rest of your function only has to handle one form, which keeps the logic clean.

The ageGroup argument illustrates this well. Users can supply it in several ways:

library(omopgenerics, warn.conflicts = FALSE)

# Simple unnamed intervals
ageGroup <- validateAgeGroupArgument(ageGroup = list(c(0, 17), c(18, 64)))
ageGroup
$age_group
$age_group$`0 to 17`
[1]  0 17

$age_group$`18 to 64`
[1] 18 64
# Complex named multi-column form
ageGroup <- validateAgeGroupArgument(ageGroup = list(
  age_at_start = list("child" = c(0, 17), "adult" = c(18, 64)),
  age_at_end   = list(c(0, 49), c(50, Inf))
))
ageGroup
$age_at_start
$age_at_start$child
[1]  0 17

$age_at_start$adult
[1] 18 64


$age_at_end
$age_at_end$`0 to 49`
[1]  0 49

$age_at_end$`50 or above`
[1]  50 Inf

Whatever you pass in, the output is always a named list of named intervals. The function also throws clear errors if the input is malformed:

# Overlapping age groups when only one column is allowed
validateAgeGroupArgument(
  ageGroup = list(group1 = list(c(0, 19), c(20, Inf)), group2 = list(c(0, Inf))),
  multipleAgeGroup = FALSE
)
Error:
! Multiple age group are not allowed
# Negative ages
validateAgeGroupArgument(ageGroup = list(c(-5, 19), c(20, Inf)))
Error in `purrr::map()`:
ℹ In index: 1.
Caused by error:
! Elements of `ageGroup` argument must be greater or equal to "0".
# NULL not allowed
validateAgeGroupArgument(ageGroup = NULL, null = FALSE)
Error:
! `ageGroup` argument can not be NULL.

The same principle applies to validateWindowArgument(), which accepts either a plain c(0, Inf) vector or a named list of windows, and always returns a named list.

8.2.2 The {{ syntax for cohortId

validateCohortIdArgument() supports tidyselect semantics, so users can pass things like starts_with("fracture") to select cohort IDs by name. To make this work, you must wrap the argument in double curly braces {{ when you pass it to the validator:

cohortId <- validateCohortIdArgument(cohortId = {{cohortId}}, cohort = cohort)

Without {{, tidyselect expressions will not be evaluated correctly.

8.3 assert* functions

Use assert* functions for arguments that do not have a dedicated validate* function. Unlike validate*, these functions return the input invisibly and are used purely for their side effect of throwing an error.

Function Checks
assertCharacter(x, ...) x is a character vector
assertNumeric(x, ...) x is numeric
assertLogical(x, ...) x is logical
assertDate(x, ...) x is a Date
assertList(x, ...) x is a list
assertClass(x, class, ...) x has the specified class(es)
assertChoice(x, choices, ...) x is one of the allowed values
assertTable(x, ...) x is a table (data frame or tbl_sql)
assertTrue(expr, ...) The expression evaluates to TRUE

All of these accept common modifiers as arguments:

Modifier Type Meaning
length integer or NULL Required length; NULL skips the check
null logical Whether NULL is a valid input (default FALSE)
na logical Whether NA values are allowed (default FALSE)
named logical Whether elements must be named
unique logical Whether elements must be unique
min numeric Minimum value (for assertNumeric)
max numeric Maximum value (for assertNumeric)
integerish logical Whether numeric must be whole numbers (for assertNumeric)
minNchar integer Minimum number of characters per element (for assertCharacter)
call call Passed to the cli error message for better call context

8.4 Putting it together: a validation block

Here is the recommended pattern. Validation comes first, before any computation. validate* results are always reassigned. assert* calls are written as statements.

myFunction <- function(cohort, cohortId = NULL, window = c(0, Inf), overlap = FALSE) {
  # validate and normalise
  cohort   <- omopgenerics::validateCohortArgument(cohort = cohort)
  cohortId <- omopgenerics::validateCohortIdArgument(cohortId = {{cohortId}}, cohort = cohort)
  window   <- omopgenerics::validateWindowArgument(window = window)
  
  # simple type checks
  omopgenerics::assertLogical(overlap, length = 1)
  
  # business logic checks
  if (overlap && any(sapply(window, function(w) diff(w) > 365))) {
    cli::cli_abort(c(
      "x" = "Windows longer than 365 days are not supported when {.var overlap} is {.val TRUE}."
    ))
  }
  
  # ... function body
}

A second example with a CDM reference:

myFunction2 <- function(cdm, conceptSet, days = 180L, startDate = NULL, overlap = TRUE) {
  # validate and normalise
  cdm        <- omopgenerics::validateCdmArgument(cdm = cdm)
  conceptSet <- omopgenerics::validateConceptSetArgument(conceptSet = conceptSet)
  
  # simple type checks
  omopgenerics::assertNumeric(days, integerish = TRUE, min = 0, length = 1)
  omopgenerics::assertDate(startDate, length = 1, null = TRUE)
  omopgenerics::assertLogical(overlap, length = 1)
  
  # business logic checks
  if (overlap && days > 365) {
    cli::cli_abort(c(
      "x" = "{.var days} cannot be >= 365 when {.var overlap} is {.val TRUE}."
    ))
  }
  
  # ... function body
}

8.5 Writing good error messages with cli

Custom errors and warnings should use the cli package, which is already a dependency of omopgenerics. Do not use stop() or warning().

# Error
cli::cli_abort(c(
  "x" = "Argument {.var days} must be a positive integer.",
  "i" = "You supplied {.val {days}}."
))

# Warning  
cli::cli_warn(c(
  "!" = "Cohort {.val {name}} already exists in the CDM and will be overwritten."
))

# Informational message
cli::cli_inform(c(
  "i" = "Computing {length(cohortId)} cohort{?s}."
))

A few conventions for error messages:

  • Use {.var argument_name} to refer to an argument by name.
  • Use {.val {value}} to show the actual value that was supplied.
  • Use {.fn function_name} to refer to a function.
  • Use {.cls class_name} to refer to a class.
  • Use named elements "x" for errors, "!" for warnings, "i" for hints or additional information.
  • Write error messages in the present tense from the user’s perspective: “must be”, “cannot be”, “is not” rather than “expected”, “got”.

8.6 What to validate and what to skip

Validate arguments that users are likely to pass in many ways or get wrong:

  • Arguments with a standard validate* function — always validate these.
  • Type-sensitive arguments: anything the rest of the function passes directly to dplyr or database operations.
  • Arguments with constraints relative to each other (e.g. days must be positive when overlap is TRUE).

Do not validate:

  • Internal function arguments that are never user-facing.
  • Things already guaranteed by a validate* call (e.g. do not re-check that cohortId is numeric after calling validateCohortIdArgument).
  • Constants you define yourself inside the function.

8.7 Summary

Validating arguments is one of the most impactful things you can do for users — it turns opaque database errors into clear messages that point directly to the problem. omopgenerics makes this straightforward: use validate* for the standard OMOP arguments (and always reassign the result), use assert* for simple type checks, and use cli for any custom messages. A complete validation block for a typical function takes four or five lines and requires no custom logic.