Process and Add Assay Results to Database
process_and_add_assay_results.Rmd
Objective
This tutorial describes how to:
- connect to the genetics run identification database with credentials
- write plate run metadata to the database
- prepare Synergy H1 result data for upload to the database
- upload assay result data to the database
- generate threshold values from the plate run
- update assay detections for a plate run in the database
Database Credentials
For improved security, we recommend managing your database credential
information with environment variables. Information on how to set up
your config.yml file and pull information from that
file to establish a connection to the database can be found in the
article vignette("azure-authentication")
under
Authentication with a Config file.
Create Database Connection
To create a connection object, we run gr_db_connect
:
con <- gr_db_connect() # a config file will be searched for starting at the working directory.
dplyr::tbl(con, "agency")
You can also call gr_db_connect
with arguments
specifying your username, database name and host for the server for the
database. Note that this is a quick way to create a connection to the
database but if using source control its also an easy way to
accidentally share your credentials. For more information on this, and
to see code, please see the article
vignette("azure-authentication")
under
Authentication with username, dbname and host.
Every time we use a function to read or write to the database, we will pass this connection object to the function.
Add Results to Database
As of version 0.5, grunID
includes a function called
add_new_plate_results
, which allows you to pass all
arguments at once for uploading data to the database. This method offers
a more convenient and faster way of adding data, but it abstracts away
some of the details that may be useful to inspect visually. We recommend
starting with this function, as it is likely to work well. However, in
case of any errors, you may need to switch to the longer, more verbose
method of uploading. In this guide, we will walk through a few scenarios
to help you understand each of the methods for uploading data.
Single Function Call with add_new_plate_results
add_new_plate_results
offers a convenient way to quickly
add data to the database. However, it requires you to have all the
necessary information ready to pass as arguments to the function.
Additionally, it does not return data for further manipulation since it
essentially performs multiple steps at once. The following data is
required:
-
protocol_name
: The name of the plate run you are uploading data for. -
laboratory
: The code for the lab associated with this plate run data. -
genetic_method
: The genetic method code for the plate run you are uploading data for. -
lab_work_performed_by
: The person who performed the lab work. -
description
: A description to associate with the plate run. -
date_run
: The date when the plate was run through the Sherlock machine. -
filepath
: The filepath to the spreadsheet containing the results and plate map layout. -
sample_type
: The sample type. -
layout_type
: The layout type for the plate map. -
plate_size
: The size of the plate layout.
You can call add_new_plate_results
as follows:
add_new_plate_results(con, protocol_name = "new protocol", genetic_method = "SHLK",
laboratory = "DWR_GeM", lab_work_performed_by = "user",
description = "a test run", date_run = "2023-07-12",
filepath = "../misc/sherlock_results_part_1.xlsx",
sample_type = "mucus",
layout_type = "split_plate_early_late",
plate_size = 384)
A successful run of the function should return messages like this:
ℹ Adding plate run to database
✔ Plate run added to database with id = 65
ℹ Processing sherlock data
✔ Sherlock results processing complete
ℹ adding results to database
✔ Added 14883 results to the database
ℹ Generating thresholds for plate run
✔ Threshold done
inserting threshold result into database ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 100% | ETA: 0s
identified 0 samples needing OTS16 spring/winter
updating status codes for uploaded samples ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 100% | ETA: 0s
Assay records added Samples assigned run type
359 0
add_new_plate_results
also provides helpful messages
when something goes wrong. For example, when a plate run with the same
parameters already exists, it prompts you to verify if you want to
insert it anyway:
ℹ Adding plate run to database
Plate run with these values exists in database, do you wish to insert anyway?
1: Yes
2: No
Selection:
Or when there is a duplicate error from the database:
✔ Plate run added to database with id = 66
ℹ Processing sherlock data
✔ Sherlock results processing complete
ℹ adding results to database
✖ there was an error attempting to add new raw data, removing plate run associated with this from database, see the error below for more details:
Error in add_raw_assay_results(con, sherlock_results_event) :
the combination of: sample_id: 'F1722_3_A_1', assay_id: '1' already exists, to overwrite please delete previous assay run
Coming Soon An R Shiny UI for the
add_new_plate_results
function is currently in the making,
and should be avaialable soon. Check back on these docs for more
information on how to use once the new version is released.
Muli-step Approach
The multi-step approach can be used in cases where you want to each step and do some visual inspection of the data. The result is the same but broken up into steps.
Step 1: Add Plate Run Metadata
When a plate is run on the Synergy H1 reader, we want to relate
critical plate run metadata with the assay results from that particular
run in our database. We call add_plate_run
providing the
protocol ID, method ID, laboratory ID, and the person who ran the plate.
The function returns a unique plate run ID that will be associated with
the assay result records. Because our assay results reference the plate
run ID, we must add a plate run in our database before adding assay
result data.
If you do not know the ID for the protocol used, you can review all existing protocols stored in the database with the following code:
all_protocols <- grunID::get_protocols(con)
Select appropriate protocol ID with this code:
Similarly, you can call get_genetic_method
and
get_laboratory_id
to review the IDs associated with methods
and laboratories within the database.
genetic_method_id <- grunID::get_genetic_methods(con) |>
dplyr::filter(method_name == "SHERLOCK") |>
dplyr::pull(id) # SHERLOCK
laboratory_id <- grunID::get_laboratories(con) |>
dplyr::filter(stringr::str_detect(code, "DWR")) |> dplyr::pull(id) # DWR GeM
# store output of add_plate_run - contains unique plate run ID needed for later functions
plate_run <- grunID::add_plate_run(con, protocol_id, genetic_method_id,
laboratory_id, lab_work_performed_by = "Yogi Bear",
description = "Plate run for sampling event X",
date_run = "07-04-1776")
Note: save the output of add_plate_run
. This
contains the unique plate run identifier and will be needed to pass to
later functions
Step 2: Prepare Sample Details and Sherlock Output
The SHERLOCK results file must be formatted into the appropriate
Excel sheet format. You can view an example of this here.
To add results to the database, you must have a “plate_run” associated
with this upload. In this example, we will use the plate run we created
in the previous step. You can query for an existing plate run using the
get_plate_run
function.
To extract and transpose the assay result data from the Synergy H1
reader output, provide the process_sherlock
function the
filepath to the result data excel workbook, the sample details dataframe
described above, and the plate size run (either 96 or 384). The
function will return a list containing both the raw and final assay
results.
results <- grunID::process_sherlock(filepath = "templates/sherlock_results_template.xlsx",
sample_type = "mucus",
layout_type = "split_plate_early_late",
plate_run_id = plate_run
plate_size = 384)
Note: save the output of process_sherlock
. This
contains data that will be needed to to pass to
add_raw_assay_results
.
Step 4: Upload Assay Results
Provide add_raw_assay_results
a valid connection to the
database and the processed assay result data. The function returns the
number or records added.
grunID::add_raw_assay_results(con, results)
Note: you do not need to save the output of
add_raw_assay_results
.
Step 5: Generate Threshold Values
Provide generate_threshold
a valid connection to the
database and the plate run id. The function calculates the raw
fluorescence threshold values for each assay on a plate run. Currently,
this is calculated as two times the mean value of the last time step
from the control blank wells. The function returns a table containing
thresholds for a sampling event and needs to be passed to
update_assay_detections
to assign detections for
assays.
thresholds <- grunID::generate_threshold(con, plate_run_id)
Note: save the output of generate_threshold
.
This contains the threshold values and needs to be passed to
update_assay_detection
Step 6: Update Assay Detections
Provide update_assay_detection
a valid connection to the
database and the thresholds calculated for a plate run using
generate_threshold
. The function updates the assay result
table with positive detections and, if appicable, the genetic run type
identification.
grunID::update_assay_detection(con, thresholds_event)
Note: you do not need to save the output of
update_assay_detection
.
Summary of Multi-step Approach
Here is the code again to:
- connect to the genetics run identification database with credentials
- write plate run metadata to the database
- prepare Synergy H1 result data for upload to the database
- upload assay result data to the database
- generate threshold values from the plate run
- update assay detections for a plate run in the database
con <- grunID::gr_db_connect()
# S
plate_run_id <- grunID::add_plate_run(con, protocol_id, genetic_method_id,
laboratory_id, lab_work_performed_by = "Yogi Bear",
description = "Plate run for sampling event X",
date_run = "07-04-1776")
results <- grunID::process_sherlock(filepath = "templates/sherlock_results_template.xlsx",
sample_type = "mucus",
layout_type = "split_plate_early_late",
plate_run_id = plate_run_id
plate_size = 384)
# no need to store output
grunID::add_raw_assay_results(con, results)
thresholds <- grunID::generate_threshold(con, plate_run_id)
# no need to store output
grunID::update_assay_detection(con, thresholds)
# disconnect
DBI::dbDisconnect(con)