9 Error messages and input validation

Giving users good and informative error messages is key for a good user experience. To do so it is important that we perform an input validation at the beginning of each function. On the other hand we do not want to spend more time checking the input than executing the function, so don’t overdo it. In general we would check that any input function has the desired type and length and that the desired evaluation looks feasible with the current input parameters.

9.1 omopgenerics validatin functions

Some arguments that are consistent across different functions and packages have their own validate functions in the omopgenerics package:

validatevalidateAgeGroupArgument() is used to validate the ageGroup argument. The output ageGroup will always be formatted as a named list (name of the age group) and each age group will be defined by named intervals.
validateCdmArgument() is used to validate the cdm argument. By default only the class is validated as this can take time, but specific checks can be triggered if needed.
validateCohortArgument() is used to validate cohort argument, very used in many packages, this validates that the input is a properly formatted cohort. Extra checks can be triggered if needed.
validateCohortIdArgument() is used to validate cohortId argument
validateConceptSetArgument()
validateNameArgument()
validateNameStyle()
validateResultArgument()
validateStrataArgument()
validateWindowArgument()

It is important that we assign the output to the variable as the object might change during the validation process to ensure different allowed inputs but as the output of the validation process will always be the same this simplifys the code as you do not have to think about the different allowed inputs.

the ageGroup argument can be a good example of this behavior:

library(omopgenerics, warn.conflicts = FALSE)
ageGroup <- validateAgeGroupArgument(ageGroup = list(c(0, 1), c(10, 20)))
ageGroup

$age_group
$age_group$`0 to 1`
[1] 0 1

$age_group$`10 to 20`
[1] 10 20

ageGroup <- validateAgeGroupArgument(ageGroup = list(
  my_column = list("young" = c(0, 19), 20, c(21, Inf)), 
  list(c(0, 9), c(10, 19), c(20, 29), c(30, Inf))
))
ageGroup

$my_column
$my_column$young
[1]  0 19

$my_column$`20 to 20`
[1] 20 20

$my_column$`21 or above`
[1]  21 Inf


$age_group_2
$age_group_2$`0 to 9`
[1] 0 9

$age_group_2$`10 to 19`
[1] 10 19

$age_group_2$`20 to 29`
[1] 20 29

$age_group_2$`30 or above`
[1]  30 Inf

As you can see the output is always a named list that contains named intervals the function itself will also throw explanatory errors if they are not properly formatted:

validateAgeGroupArgument(
  ageGroup = list(age_group1 = list(c(0, 19), c(20, Inf)), age_group2 = list(c(0, Inf))),
  multipleAgeGroup = FALSE
)

Error:
! Multiple age group are not allowed

validateAgeGroupArgument(
  ageGroup = list(age_group1 = list(c(-5, 19), c(20, Inf)))
)

Error in `purrr::map()`:
ℹ In index: 1.
ℹ With name: age_group1.
Caused by error:
! Elements of `ageGroup` argument must be greater or equal to "0".

validateAgeGroupArgument(
  ageGroup = NULL, null = FALSE 
)

Error:
! `ageGroup` argument can not be NULL.

9.2 omopgenerics assert functions

The omopgenerics package contains some functions for simple validation steps this can be useful helpers to validate an input with a single line of code, they also contain arguments to check if they have

9.3 Examples

Let’s say we have a function with four arguments (cohort, cohortId, window and overlap), we could easily validate the input arguments of the function with 4 lines of code:

myFunction <- function(cohort, cohortId = NULL, window = c(0, Inf), overlap = FALSE) {
  # input check
  cohort <- omopgenerics::validateCohortArgument(cohort = cohort)
  cohortId <- omopgenerics::validateCohortIdArgument(cohortId = {{cohortId}}, cohort = cohort)
  window <- omopgenerics::validateWindowArgument(window = window)
  omopgenerics::assertLogical(overlap, length = 1)
  
  # code ...
  
}

Note the {{ symbols are needed to be able to use tidyselect verbs such as starts_with() or contains().

A second example that needs some custom extra code can be:

myFunction <- function(cdm, conceptSet, days = 180L, startDate = NULL, overlap = TRUE) {
  # input check
  cdm <- omopgenerics::validateCdmArgument(cdm = cdm)
  conceptSet <- omopgenerics::validateCdmArgument(conceptSet = conceptSet)
  omopgenerics::assertNumeric(days, integerish = TRUE, min = 0, length = 1)
  omopgenerics::assertDate(startDate, length = 1, null = TRUE)
  if (overlap & days > 365) {
    cli::cli_abort(c(x = "{.var days} is can not be >= 365 if {.var overlap} is TRUE."))
  }

  # code ...
  
}

You can throw custom error and warning messages using the cli package.

9.4 Conclusions

Validating arguments is a very important step to give user a good experience and prevent running undesired code in big datasets. The omopgenerics provides you with some functionality to keep the validation step short and consistent with other packages.