Databases and the servers
Databases
We have access to a few different databases in the team (as of September 8, 2024). However, access will be provided based on the projects you are working on and completion of training. Databases we have access too:
CPRD GOLD and AURUM (primary care data from the UK)
THIN (UK, France, Belgium, Spain, Romania, Italy)
UKBiobank (UK)
To get access to any of these databases you will need to complete the information governance training (links on the intranet but also here). This involves completing the online training and face to face training. Lydia Underdown is the Governance manager in the department.
For CPRD database access you will need to complete additional training provided by them and register at ndorms as a user of CPRD. See here on the intranet for more information or you can access directly here. You will also need to create an account on CPRD eRap and then be added onto a new or existing application to be granted access to CPRD on our servers. Things to note:
Antonella is the key fob holder to CPRD for the department.
Antonella needs to be including as a collaborator on ALL CPRD applications you make to CPRD.
Antonella and whoever mapped the database need to be added as co-authors for publications.
For THIN, you will need to be added onto an approved protocol and/or write one for submission. Your line manager will be able to help with this. There is also a dedicated teams channel for THIN applications. The main people for THIN are Danielle and Antonella.
For UKBiobank you will need to register and complete the application (you will need to provide a CV). The main users for UKBiobank are Frank, Marta Jnr and Danielle. NOTE for new applications of UKB data you will have to do the analysis on their dedicated servers.
Once you have completed the training you can then contact Hez and Antonella about getting access on our servers to the specific database cc’ing your line manager in. When requesting access please specify which data cut you require and if you want both Rstudio and ATLAS access for the database in question. If you are unsure speak to your line manager.
The servers
We have two servers with a R studio interface one for running your final code and one for running code that is under development. When we develop code for studies we only use a subset of a database and we do not run on the main one until we have tested our code. This subset is a random 100k people and is provided for each database we have. Please use the 100k when developing code on the development server. Links to the servers can be found below:
NOTE if you are on a laptop you need to be connected to University’s virtual private network (VPN)
Documents for helping to configure Rstudio and GitHub on the servers can be found in other chapters in this book or here.
Jobs on the servers
Sometimes we need to see what jobs are running on the servers. If you have a desktop you can access a tool which shows what jobs are running on the servers and by who. NOTE: this tool does not work on laptops due to firewall restrictions by the university. The link is here.
If you do not have access to this, you can check what jobs you have running on the server via RStudio and how to kill these jobs. You can also ask Hez to kill jobs.