Published

September 8, 2024

Connect to the database

Getting started

To connect to databases we will use DBI package and CDMConnector, you can find more information about both packages in their websites:

Connect to database (standard way) and set up the environment

Load libraries

The following libraries will be used in this chapter: DBI, RPostgres, dplyr, dbplyr, usethis and here. If you do not have them installed you can install them with the following command:

install.packages(c("DBI", "RPostgres", "dplyr", "dbplyr", "usethis", "here"))
library(DBI)
library(RPostgres)
library(dplyr)
library(dbplyr)
library(usethis)
library(here)

Credentials and database details

To create a connection to a database we need some parameters:

  • host: it is the IP of the computer that contains the database in our case it will be always the same and you can check it here

  • port: port to connect, in our case it will be always the same and you can check it here

  • dbname: name of the database we want to connect, each database has a different name, link to the names of the databases hosted by our server: link

  • user: each individual has a user to connect to the database this is unique and nontransferable.

  • password: associated to each user.

To get a user and password or if you are not sure of what are the parameters of a certain database you can ask Hez.

Create a connection

There are several ways to create a connection as seen here and this depends on the Database Management System (DBMS) of your back-end.

In our case for the moment all our databases are in PostgreSQL (also refereed as Postgres) one of the most popular free dbms that exist. To connect to a Postgres we have to populate with the following information the connection details:

con <- dbConnect(drv = Postgres(),
                 dbname = "...",
                 host = "...",
                 port = "...",
                 user = "...",
                 password = "...")

To test that the connection that you created works you can test the following command:

tbl(con, sql("SELECT * FROM public.person limit 1"))
Creating CDM database /home/runner/work/Oxinfer/Oxinfer/GiBleed_5.3.zip
# Source:   SQL [1 x 18]
# Database: DuckDB v1.0.0 [unknown@Linux 6.5.0-1025-azure:R 4.4.1//tmp/RtmpwcvSWM/file15591befac34.duckdb]
  person_id gender_concept_id year_of_birth month_of_birth day_of_birth
      <int>             <int>         <int>          <int>        <int>
1         6              8532          1963             12           31
# ℹ 13 more variables: birth_datetime <dttm>, race_concept_id <int>,
#   ethnicity_concept_id <int>, location_id <int>, provider_id <int>,
#   care_site_id <int>, person_source_value <chr>, gender_source_value <chr>,
#   gender_source_concept_id <int>, race_source_value <chr>,
#   race_source_concept_id <int>, ethnicity_source_value <chr>,
#   ethnicity_source_concept_id <int>

Note that the result will be different as this one is from a synthetic database.

We can have multiple connections open at the same time, but it is recommended to not open more than one connection simultaneously, and close connections when we finish using them.

Once checked that we are able to connect to the database we can disconnect:

dbDisconnect(conn = con)

Database schemas

Databases are organised in schemas. Schemas are the internal structure of a database, they are like “folders”. In general when we work with an OMOP database we will have to schemas:

  • cdmSchema this schema contains all the OMOP Standard tables. Typically we would only have reading permissions to this schema, because these tables are not meant to be edited.

  • writeSchema this schema is usually empty and is for the user to save their tables (usually cohorts) of intermediate results that you want to keep or reuse later. Typically you would have writing and reading permissions for this schema. In our database we can only read and edit the tables that we created, so we can not edit or read other people tables. It is a good practice to write your own tables with a starting prefix so it is easier to avoid name conflicts.

For our databases the cdmSchema =public and the writeSchema =results.

Create the cdm object

The cdm object is a structure on top of our database connection to access all the tables in a user-friendly way, see more information in the CDMConnector website

To create our first cdm object we need first a connection to our back-end:

con <- dbConnect(drv = Postgres(),
                 dbname = "...",
                 host = "...",
                 port = "...",
                 user = "...",
                 password = "...")

And then we can create the cdm object:

library(CDMConnector)
cdm <- cdmFromCon(con = con, cdmSchema = "public", writeSchema = "results")

The cdm object has a print that shows all the tables that you are connected to:

cdm
── # OMOP CDM reference (duckdb) of Synthea synthetic health database ──────────
• omop tables: person, observation_period, visit_occurrence, visit_detail,
condition_occurrence, drug_exposure, procedure_occurrence, device_exposure,
measurement, observation, death, note, note_nlp, specimen, fact_relationship,
location, care_site, provider, payer_plan_period, cost, drug_era, dose_era,
condition_era, metadata, cdm_source, concept, vocabulary, domain,
concept_class, concept_relationship, relationship, concept_synonym,
concept_ancestor, source_to_concept_map, drug_strength
• cohort tables: -
• achilles tables: -
• other tables: -

And you can easily access to one of this tables with:

cdm$person
# Source:   table<main.person> [?? x 18]
# Database: DuckDB v1.0.0 [unknown@Linux 6.5.0-1025-azure:R 4.4.1//tmp/RtmpwcvSWM/file15596ba17b2b.duckdb]
   person_id gender_concept_id year_of_birth month_of_birth day_of_birth
       <int>             <int>         <int>          <int>        <int>
 1         6              8532          1963             12           31
 2       123              8507          1950              4           12
 3       129              8507          1974             10            7
 4        16              8532          1971             10           13
 5        65              8532          1967              3           31
 6        74              8532          1972              1            5
 7        42              8532          1909             11            2
 8       187              8507          1945              7           23
 9        18              8532          1965             11           17
10       111              8532          1975              5            2
# ℹ more rows
# ℹ 13 more variables: birth_datetime <dttm>, race_concept_id <int>,
#   ethnicity_concept_id <int>, location_id <int>, provider_id <int>,
#   care_site_id <int>, person_source_value <chr>, gender_source_value <chr>,
#   gender_source_concept_id <int>, race_source_value <chr>,
#   race_source_concept_id <int>, ethnicity_source_value <chr>,
#   ethnicity_source_concept_id <int>

In the other tutorials you will learn more how to program and use the cdm object.

Once you finished using it you can close the connection of the cdm object with:

cdmDisconnect(cdm)

This is equivalent to do:

dbDisconnect(con)

Change password

The database administrator will give us a password it is extremely important that you change it after you connect to the database for first time. To change your password you must:

  1. open a connection to any of our databases:
con <- dbConnect(drv = Postgres(),
                 dbname = "...", # here you can connect to any of our databases
                 host = "...",
                 port = "...",
                 user = "...",
                 password = "...")
  1. Change the password, to do so we have to run the following command:
dbGetQuery(con, "ALTER USER xxxxx WITH PASSWORD 'xxxxxxxxxxx'")

Example:

dbGetQuery(con, "ALTER USER martics WITH PASSWORD '12345678'")

Make sure that you chose an strong password, see the University guide for more information about strong and safe passwords.

  1. Disconnect from the database:
dbDisconnect(con)
  1. Connect again with the new password to check that the change was done correctly:
con <- dbConnect(drv = Postgres(),
                 dbname = "...",
                 host = "...",
                 port = "...",
                 user = "...",
                 password = "new_password")
  1. Disconnect from the database again:
dbDisconnect(con)

*NOTE: even you connect to a particular database to change the password, username and password are unique for all databases, so changing it only once changes for any dbname in our evironment.

Set your .Renviron

What are secrets?

Connection to the database it is a repetitive process that you will repeat every day, but at the same time you need to keep all your database credentials safely guarded. The R environ file will help use to do so.

.Renviron is a file linked to your R session to safely save secrets. It is loaded at the beginning of each session and secrets are kept till the session is terminated or restarted.

To access to a secret you can type:

Sys.getenv("MY_SECRET")
[1] ""

As we have just seen if a secret does not exist the output will be an empty string: "".

You can set a temporary secret using the following command:

Sys.setenv(MY_SECRET = "123456789")

Then now if you run the same command than before you will get the value that we have just set back:

Sys.getenv("MY_SECRET")
[1] "123456789"

This value can be assigned to a variable as we would do as usual:

x <- "123456789"
print(x)
[1] "123456789"
y <- Sys.getenv("MY_SECRET")
print(y)
[1] "123456789"

As you can see x and y have the same value but if you share the code in the second case the code would only work if you have the same secret stored in your environment. This is very useful to store the connection details so even if you share your code you would not be share sensitive information.

There are two type of secrets:

  • temporal (as we have just seen), these secrets only last for while the session is ongoing the moment that you restart or terminate it all the secrets will be gone, so it is not recommended to use temporal secrets.

  • permanent (we will see in the next step), these secrets are kept across different sessions and will always be in your environment unless you explicitly delete them. These secrets are stored in the .Renviron file. We will use permanent secrets to store our database credentials.

You can read more about secrets here.

*NOTE: you can see all the secrets of your R session running the following command:

Sys.getenv()

Set up your own .Renviron file

To open your .Renviron file you can run the following command:

edit_r_environ()

There you can write secrets that you want to store

MY_SECRET = "123456789"

It is like writing in the Sys.setenv() function, but these secrets will be permanent and always be loaded in your session.

Secrets are loaded every time our session starts, so if we modify the .Renviron file and we want the new secrets to be loaded we should restart the R session (Control+Shift+F10).

Try writing your username, password, port and host there and create a connection with the following code:

con <- dbConnect(drv = Postgres(),
                 dbname = "...",
                 host = Sys.getenv("HOST"),
                 port = Sys.getenv("PORT"),
                 user = Sys.getenv("USER"),
                 password = Sys.getenv("PASSWORD"))

Final remark

YOU MUST NEVER INCLUDE CONNECTION DETAILS IN A SCRIPT THAT IS PUBLIC OR YOU SHARE WITH SOMEONE, EVEN IF YOU TRUST THAT PERSON OR IS IN THE SAME ORGANIZATION.