Standardizing Medical Devices Surveillance Data

2020-06-14

Why?

Medical device event data are messy.

Common challenges include:

How?

The mds package provides a standardized framework to address these challenges:

Purpose of This Vignette

Note on Statistical Algorithms

mds data and analysis standards allow for seamless application of various statistical trending algorithms via the mdsstat package (under development).

Data: MAUDE and Simulated Sales

Our example dataset maude was queried from the FDA MAUDE API and contains 535 reported events on bone cement in 2017. Furthermore, a simulated exposure dataset sales was generated to provide denominator data for our bone cement events.

library(mds)
dim(maude)
#> [1] 535  15
dim(sales)
#> [1] 360   4
head(maude, 3)
report_number event_type date_received product_problem_flag adverse_event_flag report_source_code lot_number model_number manufacturer_d_name manufacturer_d_country brand_name device_name medical_specialty_description device_class region
0002249697-2017-00023 Malfunction 20170103 Y N Manufacturer report MHX076 STRYKER ORTHOPAEDICS-MAHWAH US SIMPLEX P - US TOBRA FD 10-PK Bone Cement Orthopedic 2 Central
0002249697-2017-00028 Malfunction 20170103 Y N Manufacturer report MHX080 STRYKER ORTHOPAEDICS-MAHWAH US SIMPLEX P - US TOBRA FD 10-PK Bone Cement Orthopedic 2 West
0002249697-2017-00025 Malfunction 20170103 Y N Manufacturer report MHX076 STRYKER ORTHOPAEDICS-MAHWAH US SIMPLEX P - US TOBRA FD 10-PK Bone Cement Orthopedic 2 Central
head(sales, 3)
device_name region sales_month sales_volume
Arthroscope Central 2017-01-01 83
Arthroscope Central 2017-02-01 119
Arthroscope Central 2017-03-01 112

Raw Data to Trending in 4 Steps

The general workflow to go from data to trending over time is as follows:

  1. Use deviceevent() to standardize device-event data.
  2. Use exposure() to standardize exposure data (optional).
  3. Use define_analyses() to enumerate possible analysis combinations.
  4. Use time_series() to generate counts (and/or rates) by time based on your defined analyses.

Live Example

# Step 1 - Device Events
de <- deviceevent(
  maude,
  time="date_received",
  device_hierarchy=c("device_name", "device_class"),
  event_hierarchy=c("event_type", "medical_specialty_description"),
  key="report_number",
  covariates="region",
  descriptors="_all_")

# Step 2 - Exposures (Optional step)
ex <- exposure(
  sales,
  time="sales_month",
  device_hierarchy="device_name",
  match_levels="region",
  count="sales_volume")

# Step 3 - Define Analyses
da <- define_analyses(
  de,
  device_level="device_name",
  exposure=ex,
  covariates="region")

# Step 4 - Time Series
ts <- time_series(
  da,
  deviceevents=de,
  exposure=ex)

What to Do Next

You may:

Summarize Defined Analyses

summary(da)
#> $`Analyses Timestamp`
#> [1] "2020-06-14 21:45:56 EDT"
#> 
#> $`Analyses Counts`
#>         Total Analyses Analyses with Exposure          Device Levels 
#>                     27                     27                      6 
#>           Event Levels             Covariates 
#>                      1                      2 
#> 
#> $`Date Ranges`
#>           Data      Start        End
#> 1 Device-Event 2017-01-01 2017-12-01
#> 2     Exposure 2017-01-01 2017-12-01
#> 3         Both 2017-01-01 2017-12-01

Show All Analyses as a Data Frame

dadf <- define_analyses_dataframe(da)
head(dadf, 3)
id device_level_source device_level device_1up_source device_1up event_level_source event_level covariate covariate_level invivo date_range_de_start date_range_de_end exp_device_level exp_covariate_level date_range_exposure_start date_range_exposure_end date_range_de_exp_start date_range_de_exp_end
1 device_name Bone Cement device_class 2 event_type All region Central FALSE 2017-01-01 2017-12-01 Bone Cement Central 2017-01-01 2017-12-01 2017-01-01 2017-12-01
2 device_name Bone Cement device_class 2 event_type All region West FALSE 2017-01-01 2017-12-01 Bone Cement West 2017-01-01 2017-12-01 2017-01-01 2017-12-01
3 device_name Bone Cement device_class 2 event_type All region East FALSE 2017-01-01 2017-12-01 Bone Cement East 2017-01-01 2017-12-01 2017-01-01 2017-12-01

Plot Time Series of Counts and Rates

plot(ts[[1]])
plot(ts[[4]], "rate", type='l')

deviceevent() to Standardize Device-Event Data

Basic Usage

de <- deviceevent(maude, "date_received", c("device_name", "device_class"), c("event_type", "medical_specialty_description"))
head(de, 3)
key time device_1 device_2 event_1 event_2
1 2017-01-03 Bone Cement 2 Malfunction Orthopedic
2 2017-01-03 Bone Cement 2 Malfunction Orthopedic
3 2017-01-03 Bone Cement 2 Malfunction Orthopedic

Advanced Usage

de <- deviceevent(
  maude,
  time="date_received",
  device_hierarchy=c("device_name", "device_class"),
  event_hierarchy=c("event_type", "medical_specialty_description"),
  key="report_number",
  covariates="region",
  descriptors="_all_")
head(de, 3)
key time device_1 device_2 event_1 event_2 region product_problem_flag adverse_event_flag report_source_code lot_number model_number manufacturer_d_name manufacturer_d_country brand_name
0002249697-2017-00023 2017-01-03 Bone Cement 2 Malfunction Orthopedic Central Y N Manufacturer report MHX076 STRYKER ORTHOPAEDICS-MAHWAH US SIMPLEX P - US TOBRA FD 10-PK
0002249697-2017-00028 2017-01-03 Bone Cement 2 Malfunction Orthopedic West Y N Manufacturer report MHX080 STRYKER ORTHOPAEDICS-MAHWAH US SIMPLEX P - US TOBRA FD 10-PK
0002249697-2017-00025 2017-01-03 Bone Cement 2 Malfunction Orthopedic Central Y N Manufacturer report MHX076 STRYKER ORTHOPAEDICS-MAHWAH US SIMPLEX P - US TOBRA FD 10-PK

Required Arguments

data_frame
is the input device-event data frame. All remaining arguments refer to variables within this data frame.
time
is the name of the variable containing the time of the event. The input format of the time will be flexibly converted into Date format.
device_hierarchy
organizes all your device variables into a hierarchy. The hierarchical concept reflects how devices are often nested into progressively more general groups. Set the first variable as the lowest device level that you would like to trend at. mds remembers this hierarchy and allows trending at multiple levels as you specify.
event_hierarchy
organizes all your event variables into a hierarchy. Like devices, event variables should be categorical in nature. Free text descriptions should not be listed here, but rather in the descriptors argument. The hierarchical concept reflects how events are often nested into progressively more general groups. Set the first variable as the lowest event level that you would like to trend at. mds remembers this hierarchy and allows trending at multiple levels as you specify. If your data does not have an event variable, you will need to create a dummy variable.

Optional Arguments

key
is a unique identifier for each unique event in data_frame. If your data pipeline carries over a key variable, it is recommended to specify it here. The key allows downstream aggregated analysis to be able to “look up” individual constituent events.
covariates
are a special group of variables that may be analyzed within device. For instance, declaring covariates="Region" will allow analysis of regions within device. These variables should be categorical in nature.
descriptors
are additional variables that should be retained for the purpose of describing individual events in downstream analysis.
implant_days
contains the age in days of an implantable device at the time of the event.

exposure() to Standardize Exposure Data

Exposure data is meant to support device-event data. As such, the general expectation is that variable values match between exposure and device-event data. For example, 10 exposures for ev3 Solitaire in France will be matched exactly to ev3 Solitaire events in France, and not to events for EV3 SOLITAIRE in FRANCE.

Basic Usage

ex <- exposure(sales, "sales_month", "device_name")
head(ex, 3)
key time count device_1
1 2017-01-01 1 Arthroscope
2 2017-02-01 1 Arthroscope
3 2017-03-01 1 Arthroscope

Advanced Usage

ex <- exposure(
  sales,
  time="sales_month",
  device_hierarchy="device_name",
  match_levels="region",
  count="sales_volume")
head(ex, 3)
key time count device_1 region
1 2017-01-01 83 Arthroscope Central
2 2017-02-01 119 Arthroscope Central
3 2017-03-01 112 Arthroscope Central

Required Arguments

Note: Although not required, count will commonly be used as well.

data_frame
is the input exposure data frame. All remaining arguments refer to variables within this data frame.
time
is the name of the variable containing the time of the exposure. The input format of the time will be flexibly converted into Date format. If exposure will be used, it is critical to have sufficient time granularity. For example, if analysis will be done monthly, exposure data must be no less granular than monthly. mds does not make assumptions about filling in holes in time!
device_hierarchy
contains all exposure device variables to match to your device-event data. As such, the values within these variables must match exactly to the values within each respective variable in the device-event data device_hierarchy parameter.
event_hierarchy
contains all exposure event variables to match to your device-event data. As such, the values within these variables must match exactly to the values within each respective variable in the device-event data event_hierarchy parameter. Exposures at an event level is not common.

Optional Arguments

count
is the most commonly specified optional parameter. It contains the number of exposures. If not specified, the number of rows will be used as a proxy for count.
key
is a unique identifier for each unique exposure in data_frame. If your data pipeline carries over a key variable, it is recommended to specify it here. The key allows downstream aggregated analysis to be able to “look up” individual constituent exposure records.
match_levels
are variables aside from time, device, and event that specify an exposure. A common “match level” is country, if your exposure data is specific by country.

define_analyses() to Enumerate Analysis Combinations

After standardizing device-event data using deviceevent() and, optionally, exposure data using exposure(), the next step is to discover what types of analyses are possible. This is separated from actually doing the analysis (counting, calculations, statistics, etc.) because:

Basic Usage

da <- define_analyses(de, "device_name")

Note that define_analyses() returns a list of individual analyses. Each individual analysis contains a set of instructions. You can view an analysis by submitting da[[1]], da[[2]], etc., but a less cumbersome overview is possible using summary() and define_analyses_dataframe().

summary(da)
#> $`Analyses Timestamp`
#> [1] "2020-06-14 21:45:58 EDT"
#> 
#> $`Analyses Counts`
#>         Total Analyses Analyses with Exposure          Device Levels 
#>                      7                      0                      6 
#>           Event Levels             Covariates 
#>                      1                      1 
#> 
#> $`Date Ranges`
#>           Data      Start        End
#> 1 Device-Event 2017-01-01 2017-12-01
#> 2     Exposure       <NA>       <NA>
#> 3         Both 2017-01-01 2017-12-01
head(define_analyses_dataframe(da), 3)
id device_level_source device_level device_1up_source device_1up event_level_source event_level covariate covariate_level invivo date_range_de_start date_range_de_end date_range_de_exp_start date_range_de_exp_end
1 device_name Bone Cement device_class 2 event_type All Data All FALSE 2017-01-01 2017-12-01 2017-01-01 2017-12-01
2 device_name Bone Cement, Antibiotic device_class 2 event_type All Data All FALSE 2017-01-01 2017-12-01 2017-01-01 2017-12-01
3 device_name Cement, Bone, Vertebroplasty device_class 2 event_type All Data All FALSE 2017-01-01 2017-12-01 2017-01-01 2017-12-01

Advanced Usage

da <- define_analyses(
  de,
  device_level="device_name",
  exposure=ex,
  covariates="region")
summary(da)
#> $`Analyses Timestamp`
#> [1] "2020-06-14 21:45:59 EDT"
#> 
#> $`Analyses Counts`
#>         Total Analyses Analyses with Exposure          Device Levels 
#>                     27                     27                      6 
#>           Event Levels             Covariates 
#>                      1                      2 
#> 
#> $`Date Ranges`
#>           Data      Start        End
#> 1 Device-Event 2017-01-01 2017-12-01
#> 2     Exposure 2017-01-01 2017-12-01
#> 3         Both 2017-01-01 2017-12-01
head(define_analyses_dataframe(da), 3)
id device_level_source device_level device_1up_source device_1up event_level_source event_level covariate covariate_level invivo date_range_de_start date_range_de_end exp_device_level exp_covariate_level date_range_exposure_start date_range_exposure_end date_range_de_exp_start date_range_de_exp_end
1 device_name Bone Cement device_class 2 event_type All region Central FALSE 2017-01-01 2017-12-01 Bone Cement Central 2017-01-01 2017-12-01 2017-01-01 2017-12-01
2 device_name Bone Cement device_class 2 event_type All region West FALSE 2017-01-01 2017-12-01 Bone Cement West 2017-01-01 2017-12-01 2017-01-01 2017-12-01
3 device_name Bone Cement device_class 2 event_type All region East FALSE 2017-01-01 2017-12-01 Bone Cement East 2017-01-01 2017-12-01 2017-01-01 2017-12-01

Required Arguments

deviceevents
is a standardized device-event data frame. (class() should contain "mds_de")
device_level
is the source device variable to analyze by. Access these variable names by submitting attributes(de)$device_hierarchy.

Optional Arguments

event_level
is the source event variable to analyze by. Access these variable names by submitting attributes(de)$event_hierarchy.
exposure
is a standardized exposure data frame. (class() should contain "mde_e")
date_level and date_level_n
are arguments used together to specify the time interval to analyze by. Default of "months" and 1 analyzes by month. Other examples include "months" and 12 for yearly, or "days" and 7 for weekly.
covariates
are variables to analyze within device. For example, c("region") analyzes by each level of region within device.
times_to_calc
specifies how many time periods (counting back in time) to analyze for. Time period is defined using date_level and date_level_n.

What About Analyses Across All Events, All Devices, etc?

It is always assumed that analyses at aggregated levels are desired. (such as analysis of all events for a given device, or analysis of all events across all devices)

Aggregated level analysis is easily recognized by the "All" and "Data" values in device_level, event_level, covariate, and covariate_level.

id device_level_source device_level device_1up_source device_1up event_level_source event_level covariate covariate_level invivo date_range_de_start date_range_de_end exp_device_level exp_covariate_level date_range_exposure_start date_range_exposure_end date_range_de_exp_start date_range_de_exp_end
11 11 device_name Cement, Bone, Vertebroplasty device_class 2 event_type All region East FALSE 2017-01-01 2017-08-01 Cement, Bone, Vertebroplasty East 2017-01-01 2017-12-01 2017-01-01 2017-08-01
12 12 device_name Cement, Bone, Vertebroplasty device_class 2 event_type All region Central FALSE 2017-02-01 2017-12-01 Cement, Bone, Vertebroplasty Central 2017-01-01 2017-12-01 2017-02-01 2017-12-01
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

How to Customize Analyses

There are several options:

time_series() to Generate Counts, Rates, and More

Once an analysis has been defined using define_analyses(), the analyses instructions can be executed using time_series(), returning by defined time periods:

Basic Usage

ts <- time_series(da, de)

Note that time_series() returns, in a list, one time series data frame for every analysis. You can select a time series by submitting ts[[1]], ts[[2]], etc.

head(ts[[1]], 3)
time nA ids
17167 13 0002249697-2017-00023
17198 7 0002249697-2017-00488
17226 5 0002249697-2017-00755

Advanced Usage

ts <- time_series(
  da,
  deviceevents=de,
  exposure=ex)
head(ts[[1]], 3)
time nA ids exposure ids_exposure
17167 13 0002249697-2017-00023 8597 37
17198 7 0002249697-2017-00488 5115 38
17226 5 0002249697-2017-00755 10191 39

Required Arguments

analysis
is a single defined analysis (class() should contain "mds_da") or a list of defined analysis.
deviceevents
is a standardized device-event data frame (class() contains "mds_de"). It is typically the same data frame used to generate analysis, but can be another "mds_de" data frame, such as a cut of the data at a different time. Note if, say, an older dataset is being used, the analysis date ranges must correspond.

Optional Arguments

exposure
is a standardized exposure data frame (class() contains "mds_e"). It is typically the same data frame used to generate analysis. Like deviceevents, another data frame may be used, but the analysis instructions must correspond.
use_hierarchy
is a logical value for whether device and event hierarchies should be used for the calculation of disproportionality analysis (DPA) counts. Submit ?time_series.mds_da for more details.

How to Modify Counts & Exposures

It is not uncommon to adjust event and exposure counts, such as with applications of rolling or moving averages. These adjustments should be applied after generating time series data frames from time_series().

plot()ing a Time Series

Plotting an individual time series generated by time_series() is simple. Simply call plot() on the time series object:

plot(ts[[1]])

There are a few custom parameters, including:

mode
with common values of "nA" (representing the device-event of interest), "exposure", and "rate" (simply "nA"/"exposure"). Less common are "nB", "nC", and "nD" representing the cell counts of the disproportionality analysis (DPA) contingency table.
xlab, ylab, main
representing default plot() behavior. By default, axes and title labels are inferred directly from the time series.

All other parameters are from plot.default().